首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >Sentiment analysis of Chinese micro-blog using vector space model
【24h】

Sentiment analysis of Chinese micro-blog using vector space model

机译:基于向量空间模型的中文微博情感分析

获取原文

摘要

In recent years, mining micro-blog becomes a hot research field, especially it may create commercial and political values in a fast changing big data era. This paper investigates the sentiment analysis of Chinese micro-blogs (SACM) using a vector space model. With the analysis of the nature properties of the Chinese micro-blogs, a sentiment analysis system has been proposed by formulating it as a two-type classification problem whether positive sentiment or negative sentiment. To achieve robust results, a preprocessing approach has been developed to remove the emotional unrelated words, transform the traditional expression to simplified one, and unify the punctuation by analyzing the dynamic and complicated micro-blog expressions. Besides, with aids of word segmentation and frequency statistical techniques the vector space model has been formed to generate the sentiment-related micro-blog feature vector. The support vector machine (SVM) has been taken as the classifier for its excellent ability in solving two-class classification problem. Experiments have been carried out to evaluate the proposed sentiment analysis system. Three different databases have been used in word segmentation stage including the emotion dictionary from Dalian University of Technology, CNKI-Hownet emotional dictionary and our self-established dictionary. Experimental results show that the proposed SACM system is able to achieve 80.86% classification accuracy using above databases.
机译:近年来,挖掘微博成为一个热门研究领域,尤其是在快速变化的大数据时代,它可能会创造商业和政治价值。本文利用向量空间模型研究了中国微博(SACM)的情感分析。通过对中文微博的性质进行分析,提出了一种情感分析系统,将其表达为正面情感还是负面情感两类分类问题。为了获得可靠的结果,已经开发了一种预处理方法来去除情感上不相关的单词,将传统表达方式转换为简化的表达方式,并通过分析动态和复杂的微博客表达方式来统一标点符号。此外,借助分词和频率统计技术,已形成向量空间模型以生成与情感相关的微博特征向量。支持向量机(SVM)由于其解决两类分类问题的出色能力而被用作分类器。已经进行了实验以评估所提出的情绪分析系统。在分词阶段使用了三个不同的数据库,包括大连理工大学的情感词典,CNKI-Hownet情感词典和我们自己建立的词典。实验结果表明,采用上述数据库,提出的SACM系统能够达到80.86%的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号