【24h】

Sentiment Analysis of Dravidian Code Mixed Data

机译:Dravidian代码混合数据的情感分析

获取原文

摘要

This paper presents the methodologies implemented while classifying Dravidian code-mixed comments according to their polarity. With datasets of code-mixed Tamil and Malayalam available, three methods are proposed - a sub-word level model, a word embedding based model and a machine learning based architecture. The sub-word and word embedding based models utilized Long Short Term Memory (LSTM) network along with language-specific preprocessing while the machine learning model used term frequency-inverse document frequency (TF-IDF) vectorization along with a Logistic Regression model. The sub-word level model was submitted to the the track 'Sentiment Analysis for Dravidian Languages in Code-Mixed Text' proposed by Forum of Information Retrieval Evaluation in 2020 (FIRE 2020). Although it received a rank of 5 and 12 for the Tamil and Malayalam tasks respectively in the FIRE 2020 track, this paper improves upon the results by a margin to attain final weighted F1-scores of 0.65 for the Tamil task and 0.68 for the Malayalam task. The former score is equivalent to that attained by the highest ranked team of the Tamil track.
机译:本文提出了根据其极性对Dravidian Code-Mix-Mix-Mixim的评论实施的方法。使用代码混合泰米尔和MALAYALAM的数据集,提出了三种方法 - 子字级模型,基于嵌入的模型和基于机器学习的架构。基于子字和字的嵌入模型利用了长短短期存储器(LSTM)网络以及特定于语言的预处理,而机器学习模型使用术语频率逆文档频率(TF-IDF)矢量化以及Logistic回归模型。子字级模型提交给2020年信息检索评估论坛提出的代码混合文本中的Dravidian语言的轨道情绪分析(Fire 2020)。虽然它分别在火灾2020轨道中获得了泰米尔和马拉雅拉姆任务的5和12级,但是本文通过边距提高了泰米尔任务的最终加权F1分数的结果,为Malayalam任务为0.68 。前成绩相当于泰米尔赛道的最高排名球队所获得的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号