首页> 外文会议>Advances in natural language processing >Using Information from the Target Language to Improve Crosslingual Text Classification
【24h】

Using Information from the Target Language to Improve Crosslingual Text Classification

机译:使用目标语言中的信息来改进跨语言文本分类

获取原文
获取原文并翻译 | 示例

摘要

Crosslingual text classification consists of exploiting labeled documents in a source language to classify documents in a different target language. In addition to the evident translation problem, this task also faces some difficulties caused by the cultural discrepancies manifested in both languages by means of different topic distributions. Such discrepancies make the classifier unreliable for the categorization task. In order to tackle this problem we propose to improve the classification performance by using information embedded in the own target dataset. The central idea of the proposed approach is that similar documents must belong to the same category. Therefore, it classifies the documents by considering not only their own content but also information about the assigned category to other similar documents from the same target dataset. Experimental results using three different languages evidence the appropriateness of the proposed approach.
机译:跨语言文本分类包括利用源语言中的带标签文档对不同目标语言中的文档进行分类。除了明显的翻译问题外,此任务还面临一些困难,这是由于两种语言通过不同主题分布而表现出的文化差异所致。这样的差异使得分类器对于分类任务不可靠。为了解决这个问题,我们建议通过使用嵌入在自己的目标数据集中的信息来提高分类性能。提议的方法的中心思想是,相似的文档必须属于同一类别。因此,它不仅考虑文档本身的内容,还考虑有关来自同一目标数据集的其他相似文档的已分配类别的信息,从而对文档进行分类。使用三种不同语言的实验结果证明了该方法的适当性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号