首页> 外文会议>Workshop on Computational Approaches to Code Switching >Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings
【24h】

Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings

机译:跨语言单词嵌入对英语混音推文的情感分析

获取原文

摘要

This paper investigates the use of unsupervised cross-lingual embeddings for solving the problem of code-mixed social media text understanding. We specifically investigate the use of these embeddings for a sentiment analysis task for Hinglish TVveets, viz. English combined with (transliterated) Hindi. In a first step, baseline models, initialized with monolingual embeddings obtained from large collections of tweets in English and code-mixed Hinglish, were trained. In a second step, two systems using cross-lingual embeddings were researched, being (1) a supervised classifier and (2) a transfer learning approach trained on English sentiment data and evaluated on code-mixed data. We demonstrate that incorporating cross-lingual embeddings improves the results (Fl-score of 0.635 versus a monolingual baseline of 0.616), without any parallel data required to train the cross-lingual embeddings. In addition, the results show that the cross-lingual embeddings not only improve the results in a fully supervised setting, but they can also be used as a base for distant supervision, by training a sentiment model in one of the source languages and evaluating on the other language projected in the same space. The transfer learning experiments result in an Fl-score of 0.556 which is almost on par with the supervised settings and speak to the robustness of the cross-lingual embeddings approach.
机译:本文研究了如何使用无监督的跨语言嵌入来解决代码混合的社交媒体文本理解问题。我们专门调查了这些嵌入物在兴格电视电视的情感分析任务中的使用,即。英语与(音译)印地语结合。第一步,对基线模型进行了训练,这些基线模型是用从大量英语推文和代码混合的Hinglish中获得的单语嵌入进行初始化的。第二步,研究了两种使用跨语言嵌入的系统,即(1)监督分类器和(2)在英语情感数据上训练并在代码混合数据上进行评估的转移学习方法。我们证明,合并跨语言嵌入可改善结果(F1分数为0.635,而单语言基线为0.616),而无需任何并行数据来训练跨语言嵌入。此外,结果表明,跨语言嵌入不仅可以在完全受监管的环境中改善结果,而且还可以通过使用一种源语言训练情感模型并对其进行评估,从而将其用作远程监管的基础。其他语言投射在同一空间中。迁移学习实验的F1分数为0.556,几乎与监督设置相同,并且说明了跨语言嵌入方法的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号