【24h】

Harnessing Twitter #x0022;Big Data#x0022; for Automatic Emotion Identification

机译:利用Twitter“大数据”进行自动情感识别

获取原文
获取原文并翻译 | 示例

摘要

User generated content on Twitter (produced at an enormous rate of 340 million tweets per day) provides a rich source for gleaning people's emotions, which is necessary for deeper understanding of people's behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of "emotional situations" because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset (of about 2.5 million tweets) by harnessing emotion-related hash tags available in the tweets. We have applied two different machine learning algorithms for emotion identification, to study the effectiveness of various feature combinations as well as the effect of the size of the training data on the emotion identification task. Our experiments demonstrate that a combination of unigrams, big rams, sentiment/emotion-bearing words, and parts-of-speech information is most effective for gleaning emotions. The highest accuracy (65.57%) is achieved with a training data containing about 2 million tweets.
机译:用户在Twitter上生成的内容(每天产生3.4亿条推文的速度非常快)为收集人们的情绪提供了丰富的资源,这对于深入了解人们的行为和行动是必不可少的。现有的情绪识别研究缺乏对“情绪状况”的全面介绍,因为它们使用的训练数据集相对较小。为了克服这个瓶颈,我们通过利用推文中可用的与情感相关的哈希标签,自动创建了一个带有情感标签的大型数据集(约250万条推文)。我们将两种不同的机器学习算法用于情感识别,以研究各种特征组合的有效性以及训练数据的大小对情感识别任务的影响。我们的实验表明,将字母组合,大公羊,带有情感/情感的单词和词性信息组合在一起对于收集情感最有效。包含大约200万条推文的训练数据可实现最高的准确性(65.57%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号