首页> 外文会议>International Conference on Natural Language Processing and Knowledge Engineering >Research on sentiment classification of Blog based on PMI-IR
【24h】

Research on sentiment classification of Blog based on PMI-IR

机译:基于PMI-IR的博客情感分类研究

获取原文

摘要

Development of Blog texts information on the internet has brought new challenge to Chinese text classification. Aim to solving the semantics deficiency problem in traditional methods for Chinese text classification, this paper implements a text classification method on classifying a blog as joy, angry, sad or fear using a simple unsupervised learning algorithm. The classification of a blog text is predicted by the max semantic orientation (SO) of the phrases in the blog text that contains adjectives or adverbs. In this paper, the SO of a phrase is calculated as the mutual information between the given phrase and the polar words. Then the SO of the given blog text is determined by the max mutual information value. A blog text is classified as joy if the SO of its phrases is joy. Two different corpora are adopted to test our method, one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch Center, and the other is Chinese dataset provided by COAE2008 task. Based on the two datasets, the method respectively achieves a high improvement compared to the traditional methods.
机译:博客文本关于互联网的信息向中国文本分类带来了新的挑战。旨在解决中文文本分类的传统方法中的语义缺陷问题,本文用简单的无监督学习算法将博客分类为喜悦,生气,悲伤或恐惧的文本分类方法。博客文本的分类是通过包含形容词或副词的博客文本中的短语的最大语义定向(SO)来预测。在本文中,如此短语被计算为给定短语和极性词之间的互信息。然后通过最大相同信息值确定所以给定的博客文本。如果它的短语是快乐,博客文本被归类为快乐。采用两种不同的Corpora测试我们的方法,一个是由监视器和国家语言资源网络多媒体子分支中心收集的博客语料库,另一个是COAE2008任务提供的中文数据集。基于两个数据集,与传统方法相比,该方法分别达到高改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号