首页> 外文OA文献 >Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts
【2h】

Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts

机译:社交媒体文本中多语言情感分析的资源创建和评估

摘要

Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology. In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data. Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.
机译:情感分析(SA)根据文本表达观点的极性来考虑文本的分类。 SA系统与许多实际应用程序(例如市场营销,电子政务,商业智能,行为科学)以及自然语言处理(NLP)中的许多任务高度相关-信息提取,问题回答,文本含义,仅举几例。该领域的重要性已通过研究中提出的大量方法以及其从其他学科中获得的兴趣以及使用其技术创建的应用程序得到了证明。在我们的案例中,主要重点是在媒体监视的背景下使用情绪分析,以跟踪对事件的整体反应。我们面临的主要挑战是,推文使用不同的语言编写,并且一个公正的系统应该能够处理所有这些推文,以便处理所有(可能的)可用数据。不幸的是,尽管存在许多语言资源来处理用英语编写的文本,但对于许多其他语言而言,数据和工具却很少。继(Balahur和Turchi,2013)中描述的最初工作之后,本文中我们扩展了对实施多语言系统的可能性的研究,该系统能够a)使用通过机器翻译获得的训练数据将推文中表达的情感分类为多种语言; b)检验翻译质量在多大程度上影响高度非正式文本的情感分类表现; c)使用少量以目标语言注释的数据来改善多语言情感分类。为此,测试了不同大小的目标语言数据。我们探索的语言是:阿拉伯语,土耳其语,俄语,意大利语,西班牙语,德语和法语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号