Heterogeneous Document Embeddings for Cross-Lingual Text Classification - Items

Item
Groups

approved

Heterogeneous Document Embeddings for Cross-Lingual Text Classification

Funnelling (Fun) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In Fun, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLC systems where these correlations cannot be leveraged. We here describe Generalized Funnelling (gFun), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that gFun improves on Fun by describing experiments on two large, standard multilingual datasets for multi-label text classification.

Tags

Data and Resources

To access the resources you must log in

Link to PublicationHTML

The resource: 'Link to Publication' is not accessible as guest user. You must login to access it!

Item URL

https://data.d4science.org/ctlg/SoBigDataLiteracy/heterogeneous_document_embeddings_for_cross-lingual_text_classification

Additional Info

Field	Value
Creator	Moreo, Alejandro,
Creator	Pedrotti, Andrea,
Creator	Sebastiani, Fabrizio, fabrizio.sebastiani@isti.cnr.it
DOI	https://doi.org/10.1145/3412841.3442093
Group	Social Impact of AI and explainable ML
Publisher	Proceedings of the 36th ACM/SIGAPP Symposium On Applied Computing (SAC 2021)
Source	SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing. March 2021, Pages 685–688
Thematic Cluster	Other
system:type	ConferencePaper

Management Info

Field	Value
Author	Wright Joanna
Maintainer	Sebastiani Fabrizio
Version	1
Last Updated	8 September 2023, 17:00 (CEST)
Created	6 May 2021, 15:19 (CEST)