【24h】

Density Based Active Self-training for Cross-Lingual Sentiment Classification

机译:基于密度的主动自我训练用于跨语言情感分类

获取原文

摘要

Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, since machine translation quality is still far from satisfactory and also term distribution across languages may be dissimilar, these techniques cannot reach the performance of monolingual approaches. To overcome these limitations, we propose a novel learning model based on active learning and self-training to incorporate unlabeled data from the target language into the learning process. Further, in this model, we consider the density of unlabeled data to avoid outlier selection in active learning. The proposed model was applied to book review datasets in two different languages. Experiments showed that the proposed model could effectively reduce labeling efforts in comparison with some baseline methods.
机译:跨语言情感分类旨在利用一种语言(通常为英语)的带注释的情感资源对另一种语言的情感进行分类。现有的大多数研究工作都依靠自动机器翻译服务将信息从一种语言直接投射到另一种语言。但是,由于机器翻译质量仍然不能令人满意,而且跨语言的术语分配可能也不相同,因此这些技术无法达到单语方法的性能。为了克服这些限制,我们提出了一种基于主动学习和自我训练的新颖学习模型,将来自目标语言的未标记数据合并到学习过程中。此外,在此模型中,我们考虑了未标记数据的密度,以避免主动学习中的异常选择。所提出的模型已应用于两种不同语言的书评数据集。实验表明,与某些基线方法相比,该模型可以有效减少标注工作量。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号