Sentiment Analysis of Dravidian Code Mixed Data

机译：Dravidian代码混合数据的情感分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents the methodologies implemented while classifying Dravidian code-mixed comments according to their polarity. With datasets of code-mixed Tamil and Malayalam available, three methods are proposed - a sub-word level model, a word embedding based model and a machine learning based architecture. The sub-word and word embedding based models utilized Long Short Term Memory (LSTM) network along with language-specific preprocessing while the machine learning model used term frequency-inverse document frequency (TF-IDF) vectorization along with a Logistic Regression model. The sub-word level model was submitted to the the track 'Sentiment Analysis for Dravidian Languages in Code-Mixed Text' proposed by Forum of Information Retrieval Evaluation in 2020 (FIRE 2020). Although it received a rank of 5 and 12 for the Tamil and Malayalam tasks respectively in the FIRE 2020 track, this paper improves upon the results by a margin to attain final weighted F1-scores of 0.65 for the Tamil task and 0.68 for the Malayalam task. The former score is equivalent to that attained by the highest ranked team of the Tamil track.

机译：本文提出了根据其极性对Dravidian Code-Mix-Mix-Mixim的评论实施的方法。使用代码混合泰米尔和MALAYALAM的数据集，提出了三种方法 - 子字级模型，基于嵌入的模型和基于机器学习的架构。基于子字和字的嵌入模型利用了长短短期存储器（LSTM）网络以及特定于语言的预处理，而机器学习模型使用术语频率逆文档频率（TF-IDF）矢量化以及Logistic回归模型。子字级模型提交给2020年信息检索评估论坛提出的代码混合文本中的Dravidian语言的轨道情绪分析（Fire 2020）。虽然它分别在火灾2020轨道中获得了泰米尔和马拉雅拉姆任务的5和12级，但是本文通过边距提高了泰米尔任务的最终加权F1分数的结果，为Malayalam任务为0.68 。前成绩相当于泰米尔赛道的最高排名球队所获得的。

著录项

来源
《Workshop on Speech and Language Technologies for Dravidian Languages》|2021年|46-54|共9页
会议地点
作者
Asrita Venkata Mandalam; Yashvardhan Sharma;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Annotated corpus creation for sentiment analysis in code-mixed Hindi-English (Hinglish) social network data [J] . Neha Garg, Kamlesh Sharma Indian Journal of Science and Technology . 2020,第40期

机译：编码混合后印度英语（HINGISH）社交网络数据中的引向语料库创建
2. Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text [J] . International Journal of E-Adoption . 2020,第1期

机译：英文旁遮普语代码混合社交媒体文本的情感分析实验语言识别
3. Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus [J] . Jamatia Anupam, Swamy Steve Durairaj, Gamback Bjorn, International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2020,第5期

机译：基于码混合英语 - 印度和英语 - 孟加拉社交媒体语料库的深度学习情感分析
4. DOSA: Dravidian Code-Mixed Offensive Span Identification Dataset [C] . Manikandan Ravikiran, Subbiah Annamalai Workshop on Speech and Language Technologies for Dravidian Languages . 2021

机译：Dosa：Dravidian Code-Mixed冒犯跨度识别数据集
5. Do Words Really Matter? A Mixed Methods Grounded Theory Study of Student Conduct Codes and Campus Racial Climate Data [D] . Barnes, Anne E. 2020

机译：言语真的很重要吗？一个混合方法接地理论研究学生进行代码和校园种族气候数据
6. Socioeconomic factors analysis for COVID-19 US reopening sentiment with Twitter and census data [O] . Md. Mokhlesur Rahman, G.G.Md. Nawaz Ali, Xue Jun Li, 2021

机译：Covid-19美国重新开放情绪与推特和人口普查数据的社会经济因素分析
7. Annotated corpus creation for sentiment analysis in code-mixed Hindi-English (Hinglish) social network data [O] . Neha Garg 2020

机译：编码混合后印度英语（HINGISH）社交网络数据中的引向语料库创建

Sentiment Analysis of Dravidian Code Mixed Data

摘要

著录项

相似文献

相关主题

期刊订阅