首页> 外文会议>International Workshop on Semantic Evaluation >CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis
【24h】

CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

机译:在Semeval-2020任务9:Code-Switched Word Embeddings的有效性在Semeval-2020任务中嵌入

获取原文

摘要

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9: Sentiment Analysis on Code-Mixed Social Media Text. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition of 0.656, with our team (codalab username francesita) ranking 14 out of 29 participating teams, beating the baseline.
机译:社交媒体帖子的情感分析的越来越受欢迎和应用程序自然导致了用多种语言编写的帖子的情感分析,一种称为代码切换的练习。虽然最近的研究成Code-Switched Posts专注于使用多语言单词嵌入品,但这些嵌入物未在代码切换数据上培训。在这项工作中,我们展示了在代码切换推文上培训的单词嵌入,特别是那些使用西班牙语和英语的人,称为Spanglish。我们探索嵌入空间,以了解它们如何捕捉两种语言中的单词的含义。我们通过参与Semeval 2020任务9来测试这些嵌入的有效性:关于代码混合社交媒体文本的情感分析。我们利用它们训练一个致力于达到0.722的F-1得分的情感分类器。这高于竞争的基线0.656,我们的团队(Codalab Username Francesita)排名第29个参与的团队,击败基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号