Cross language Text Categorization by acquiringMultilingual Domain Models from Comparable Corpora

机译：通过从可比语料库中获取多语言域模型来进行跨语言文本分类

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In a multilingual scenario, the classicalmonolingual text categorization problemcan be reformulated as a cross languageTC task, in which we have to cope withtwo or more languages (e.g. English andItalian). In this setting, the system istrained using labeled examples in a sourcelanguage (e.g. English), and it classifiesdocuments in a different target language(e.g. Italian).In this paper we propose a novel approachto solve the cross language textcategorization problem based on acquiringMultilingual Domain Models fromcomparable corpora in a totally unsupervisedway and without using any externalknowledge source (e.g. bilingual dictionaries).These Multilingual Domain Modelsare exploited to define a generalizedsimilarity function (I.e. a kernel function)among documents in different languages,which is used inside a Support Vector Machinesclassification framework. The resultsshow that our approach is a feasibleand cheap solution that largely outperformsa baseline.

机译：在多语言场景中，经典单语文本分类问题可以重新定义为一种跨语言 TC任务，我们必须应对两种或更多种语言（例如英语和义大利文）。在此设置中，系统是在源中使用带标签的示例进行培训语言（例如英语）进行分类使用其他目标语言的文档（例如意大利语）。在本文中，我们提出了一种新颖的方法解决跨语言文字基于获取的分类问题来自的多语言领域模型完全无人监督的可比语料库方式，无需使用任何外部知识来源（例如双语词典）。这些多语言领域模型被用来定义一个广义的相似度函数（即内核函数）在不同语言的文档中，在支持向量机内部使用分类框架。结果表明我们的方法是可行的便宜的解决方案，其性能大大优于基线。

著录项

来源
《43rd Annual Meeting of the Association for Computational Linguistics: Proceeding of the Conference》|2005年|9-16|共8页
会议地点
作者
Alfio Gliozzo; Carlo Strapparava;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Categorization of Unorganized Text Corpora for better Domain-Specific Language Modeling [J] . Advances in Electrical and Electronic Engineering . 2013,第5期

机译：分类非组织文本语料库，以实现更好的领域特定语言建模
2. Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework [J] . Razieh Rahimi, Azadeh Shakery, Irwin King Information Processing & Management . 2016,第2期

机译：使用语言建模框架从可比较的语料库中提取翻译以进行跨语言信息检索
3. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulić, Wim De Smet, Marie-Francine Moens Information Retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
4. Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization [C] . Alfio Gliozzo, Carlo Strapparava 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics(COLING.ACL 2006) vol.1 . 2006

机译：利用可比语料库和双语词典进行跨语言文本分类
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Cross-Domain Authorship Attribution Using Pre-trained Language Models [O] . Georgios Barlas, Efstathios Stamatatos -1

机译：使用预先训练的语言模型进行跨域作者归属
7. Cross language Text Categorization by acquiring Multilingual Domain Models from Comparable Corpora [O] . Alfio Gliozzo 2005

机译：通过从Comparable Corpora获取多语言域模型进行跨语言文本分类

Cross language Text Categorization by acquiringMultilingual Domain Models from Comparable Corpora

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅