【24h】

Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization

机译:利用可比语料库和双语词典进行跨语言文本分类

获取原文
获取原文并翻译 | 示例

摘要

Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian). In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. The core technique relies on the automatic acquisition of Multilingual Domain Models from comparable corpora. Experiments show the effectiveness of our approach, providing a low cost solution for the Cross Language Text Categorization task. In particular, when bilingual dictionaries are available the performance of the categorization gets close to that of monolingual text categorization.
机译:跨语言文本分类是在使用源语言(例如意大利语)的带标签文档训练系统的同时,为以目标语言(例如英语)编写的文档分配语义类的任务。在这项工作中,我们根据双语资源的可用性提出了许多解决方案,并且表明即使没有此类资源可访问,也可以解决该问题。核心技术依赖于从可比语料库中自动获取多语言域模型。实验证明了我们方法的有效性,为跨语言文本分类任务提供了一种低成本解决方案。特别地,当双语字典可用时,分类的性能接近单语文本分类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号