首页> 外文会议>Workshop on Speech and Language Technologies for Dravidian Languages >Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags
【24h】

Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags

机译:使用双向RNN和语言标签的代码混合推文的情感分类

获取原文

摘要

Sentiment analysis tools and models have been developed extensively throughout the years, for European languages. In contrast, similar tools for Indian Languages are scarce. This is because, state-of-the-art pre-processing tools like POS tagger, shallow parsers, etc., are not readily available for Indian languages. Although, such working tools for Indian languages, like Hindi and Bengali, that are spoken by the majority of the population, are available, finding the same for less spoken languages like, Tamil, Telugu, and Malayalam, is difficult. Moreover, due to the advent of social media, the multi-lingual population of India, who are comfortable with both English ad their regional language, prefer to communicate by mixing both languages. This gives rise to massive code-mixed content and automatically annotating them with their respective sentiment labels becomes a challenging task. In this work, we take up a similar challenge of developing a sentiment analysis model that can work with English-Tamil code-mixed data. The proposed work tries to solve this by using bi-directional LSTMs along with language tagging. Other traditional methods, based on classical machine learning algorithms have also been discussed in the literature, and they also act as the baseline systems to which we will compare our Neural Network based model. The performance of the developed algorithm, based on Neural Network architecture, garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.
机译:由于欧洲语言,全文发展了广泛的情感分析工具和模型。相比之下,印度语言的类似工具稀缺。这是因为,最先进的预处理工具,如POS标签,浅段解析器等,不容易可用于印度语言。虽然,所以可以获得的印度语言,如海湾和孟加拉的印度语言的工作工具,可供大部分人口所说,以达米尔,泰卢固国家和马拉雅拉姆的较少口语语言寻找相同的语言。此外,由于社交媒体的出现,印度的多语言人口,对其区域语言感到舒适,更愿意通过混合两种语言进行沟通。这引起了大规模的代码混合内容,并自动用它们各自的情绪标签向它们注释成为一个具有挑战性的任务。在这项工作中,我们为开发一种类似的挑战,开发一种可以使用英语 - 泰米尔代码混合数据的情感分析模型。所提出的工作试图通过使用双向LSTMS以及语言标记来解决此问题。在文献中还讨论了基于古典机器学习算法的其他传统方法,并且他们也充当了我们将比较基于神经网络的模型的基线系统。发达算法的性能,基于神经网络架构,获得精度,召回和0.59,0.66和0.58的F1分别分别为0.59,0.66和0.58。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号