Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings

机译：跨语言单词嵌入对英语混音推文的情感分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates the use of unsupervised cross-lingual embeddings for solving the problem of code-mixed social media text understanding. We specifically investigate the use of these embeddings for a sentiment analysis task for Hinglish TVveets, viz. English combined with (transliterated) Hindi. In a first step, baseline models, initialized with monolingual embeddings obtained from large collections of tweets in English and code-mixed Hinglish, were trained. In a second step, two systems using cross-lingual embeddings were researched, being (1) a supervised classifier and (2) a transfer learning approach trained on English sentiment data and evaluated on code-mixed data. We demonstrate that incorporating cross-lingual embeddings improves the results (Fl-score of 0.635 versus a monolingual baseline of 0.616), without any parallel data required to train the cross-lingual embeddings. In addition, the results show that the cross-lingual embeddings not only improve the results in a fully supervised setting, but they can also be used as a base for distant supervision, by training a sentiment model in one of the source languages and evaluating on the other language projected in the same space. The transfer learning experiments result in an Fl-score of 0.556 which is almost on par with the supervised settings and speak to the robustness of the cross-lingual embeddings approach.

机译：本文研究了如何使用无监督的跨语言嵌入来解决代码混合的社交媒体文本理解问题。我们专门调查了这些嵌入物在兴格电视电视的情感分析任务中的使用，即。英语与（音译）印地语结合。第一步，对基线模型进行了训练，这些基线模型是用从大量英语推文和代码混合的Hinglish中获得的单语嵌入进行初始化的。第二步，研究了两种使用跨语言嵌入的系统，即（1）监督分类器和（2）在英语情感数据上训练并在代码混合数据上进行评估的转移学习方法。我们证明，合并跨语言嵌入可改善结果（F1分数为0.635，而单语言基线为0.616），而无需任何并行数据来训练跨语言嵌入。此外，结果表明，跨语言嵌入不仅可以在完全受监管的环境中改善结果，而且还可以通过使用一种源语言训练情感模型并对其进行评估，从而将其用作远程监管的基础。其他语言投射在同一空间中。迁移学习实验的F1分数为0.556，几乎与监督设置相同，并且说明了跨语言嵌入方法的鲁棒性。

著录项

来源
《Workshop on Computational Approaches to Code Switching》|2020年|45-51|共7页
会议地点
作者
Pranaydeep Singh; Els Lefever;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
sentiment analysis; code-mixed text; Hinglish; cross-lingual word embeddings; transfer learning;

机译：情绪分析;代码混合文本;英文跨语言单词嵌入;转移学习;

相似文献

外文文献
中文文献
专利

1. Annotated corpus creation for sentiment analysis in code-mixed Hindi-English (Hinglish) social network data [J] . Neha Garg, Kamlesh Sharma Indian Journal of Science and Technology . 2020,第40期

机译：编码混合后印度英语（HINGISH）社交网络数据中的引向语料库创建
2. Bag of Embedding Words for Sentiment Analysis of Tweets [J] . Galvez Arias Pierina, Guzman Ramos Pedro Jesús, Chipana Vila Luis Antonio, Journal of Computers . 2019,第3期

机译：嵌入词袋以分析推文
3. An effective cybernated word embedding system for analysis and language identification in code-mixed social media text [J] . Shekhar Shashi, Sharma Dilip Kumar, Sufyan Beg M.M. International journal of knowledge-based and intelligent engineering systems . 2019,第3期

机译：一个有效的电子化词嵌入系统，用于在代码混合的社交媒体文本中进行分析和语言识别
4. LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text [C] . Pranaydeep Singh, Els Lefever International Workshop on Semantic Evaluation . 2020

机译：Semeval-2020的TR3任务9：跨语言嵌入对于HINGLISH社交媒体文本的情感分析
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. A deep neural network approach for sentiment analysis of medically related texts: an analysis of tweets related to concussions in sports [O] . Kayvan Tirdad, Alex Dela Cruz, Alireza Sadeghian, 2021

机译：医学相关文本情感分析的深度神经网络方法：对运动脑脑脑震荡的推文分析
7. Annotated corpus creation for sentiment analysis in code-mixed Hindi-English (Hinglish) social network data [O] . Neha Garg 2020

机译：编码混合后印度英语（HINGISH）社交网络数据中的引向语料库创建

Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅