【24h】

Multilingual Sentiment Analysis Using Emoticons and Keywords

机译:使用表情和关键词的多语言情感分析

获取原文

摘要

Nowadays the World Wide Web has evolved into a leading communication channel and information exchange medium. Especially after the introduction of the so-called web 2.0 and the explosion that followed regarding user generated content, the amount of data available over the internet has attracted the interest of both the scientific and business community. Their efforts focus on identifying the inner structures of data and the knowledge that can be derived by analyzing them. Web 2.0 is the subject of study and research in a number of areas. One of these areas is sentiment analysis, where the main goal is to study and draw conclusions about subjectivity, polarity and the feeling that is expressed in user generated content, which mainly consist of free text documents. The goal of this paper is to apply sentiment analysis on multilingual data, focusing on documents written in Greek. We developed an integrated framework that accepts user generated documents and then identifies the polarity of the text (neutral, negative or positive) and the sentiment expressed through it (joy, love, anger or sadness). We followed a semi-supervised approach which led to the development of two techniques for the automatic collection of training data without any human intervention. Our approach involves the detection and use of self-defining features that are available within the data. We take into account two emotionally rich features: a) emoticons and b) lists of emotionally intense keywords. These features are evaluated on data coming from a popular forum, using various classifiers and feature vectors. Our experimental results point to various conclusions about the effectiveness, advantages and limitations of applying such methods on Greek data. Using keywords we achieved 90% mean accuracy on identifying the subjectivity level and 93% on correctly identifying the polarity level, whereas using emoticons the mean accuracy for each of these levels was 74% and 77% respectively.
机译:如今,万维网已经发展成为领先的通信信道和信息交换媒体。特别是在引入所谓的Web 2.0和接下来的爆炸后,互联网上可用的数据量吸引了科学和商业界的兴趣。他们的努力侧重于确定数据的内部结构以及通过分析它们可以得出的知识。 Web 2.0是许多领域的学习和研究的主题。这些领域之一是情绪分析,主要目标是研究和得出关于在用户生成内容中表达的主观性,极性和感觉的结论,这些内容主要由自由文本文档组成。本文的目标是对多语种数据进行情感分析,重点关注在希腊语中写的文件。我们开发了一个综合框架,接受用户生成的文档,然后识别通过它(喜悦,爱情,愤怒或悲伤)表达的文本(中性,消极或正)的极性。我们遵循半监督方法,导致了两种技术,用于在没有任何人为干预的情况下自动收集培训数据。我们的方法涉及检测和使用数据中可用的自定义功能。我们考虑了两个情绪丰富的特点:a)表情和b)情绪激烈关键词的列表。使用各种分类器和特征向量,对来自流行论坛的数据进行评估这些功能。我们的实验结果指出了关于在希腊数据上应用这些方法的有效性,优缺点的各种结论。使用关键字我们在识别主体性水平和93%的情况下实现了90%的平均准确度,而在正确识别极性水平,而使用表情符号的平均准确性分别为74%和77%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号