首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Sentiment Lexicon Interpolation and Polarity Estimation of Objective and Out-Of-Vocabulary Words to Improve Sentiment Classification on Microblogging
【24h】

Sentiment Lexicon Interpolation and Polarity Estimation of Objective and Out-Of-Vocabulary Words to Improve Sentiment Classification on Microblogging

机译:客观词汇和词汇外词汇的情感词插值和极性估计可改善微博上的情感分类

获取原文

摘要

Sentiment analysis has become an important classification task because a large amount of user-generated content is published over the Internet. Sentiment lexicons have been used successfully to classify the sentiment of user review datasets. More recently, microblogging services such as Twitter have become a popular data source in the domain of sentiment analysis. However, analyzing sentiments on tweets is still difficult because tweets are very short and contain slang, informal expressions, emoticons, mistyping and many words not found in a dictionary. In addition, more than 90 percent of the words in public sentiment lexicons, such as SentiWordNet, are objective words, which are often considered less important in a classification module. In this paper, we introduce a hybrid approach that incorporates sentiment lexicons into a machine learning approach to improve sentiment classification in tweets. We automatically construct an Add-on lexicon that compiles the polarity scores of objective words and out-of-vocabulary (OOV) words from tweet corpora. We also introduce a novel feature weighting method by interpolating sentiment lexicon score into uni-gram vectors in the Support Vector Machine (SVM). Results of our experiment show that our method is effective and significantly improves the sentiment classification accuracy compared to a baseline uni-gram model.
机译:情感分析已成为一项重要的分类任务,因为大量用户生成的内容是通过Internet发布的。情感词典已成功用于对用户评论数据集的情感进行分类。最近,诸如Twitter之类的微博服务已成为情感分析领域中一种流行的数据源。但是,分析推文中的情感仍然很困难,因为推文很短,并且包含s语,非正式表达,表情符号,误解和许多字典中找不到的单词。此外,在公共情感词典中(例如SentiWordNet),有90%以上的单词是客观单词,在分类模块中,这些单词通常被认为不那么重要。在本文中,我们介绍了一种混合方法,该方法将情感词典整合到机器学习方法中,以改善推文中的情感分类。我们会自动构建一个附加词典,该词典汇编来自推文语料库的目标词和非词汇(OOV)词的极性得分。我们还通过在支持向量机(SVM)中将情感词典得分内插到gram向量中,介绍了一种新颖的特征加权方法。实验结果表明,与基准单字组模型相比,我们的方法是有效的,并且可以显着提高情感分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号