首页> 外文期刊>International Journal of Computer Systems Science & Engineering >The Method for Extracting New Login Sentiment Words From Chinese Micro-Blog Based on Improved Mutual Information
【24h】

The Method for Extracting New Login Sentiment Words From Chinese Micro-Blog Based on Improved Mutual Information

机译:基于改进的相互信息从中国微博中提取新登录情绪词的方法

获取原文
获取原文并翻译 | 示例

摘要

The current method of extracting new login sentiment words not only ignores the diversity of patterns constituted by new multi-character words (the number of words is greater than two), but also disregards the influence of other new words co-occurring with a new word connoting sentiment. To solve this problem, this paper proposes a method for extracting new login sentiment words from Chinese micro-blog based on improved mutual information. First, micro-blog data are preprocessed, taking into consideration some nonsense signals such as web links and punctuation. Based on preprocessed data, the candidate strings are obtained by applying the N-gram segmentation method. Then, the extraction algorithm for new login words is proposed, which combines multi-character mutual information (MMI) and left and right adjacent entropy. In this algorithm, the MMI describes the internal cohesion of the candidate string of multiple words in a variety of constituted patterns. Then, the candidate strings are extended and filtered according to frequency, MMI, and right and left adjacency entropy, to extract new login words. Finally, the algorithm for the extraction of new login sentiment words is proposed. In this algorithm, the Sentiment Similarity between words (SW) is determined in order to measure the sentiment similarity of a new login word to other sentiment words and other new login sentiment words. Then, the sentiment tendency values of new login words are obtained by calculating the SW to extract new login sentiment words. Experimental results show that this method is very effective for the extraction of new login sentiment words.
机译:当前提取新登录情绪的方法不仅忽略了新的多字符单词构成的模式的多样性(单词数量大于两个),而且还忽略了与新词一起发生的其他新单词的影响内涵情绪。为了解决这个问题,本文提出了一种基于改进的相互信息提取来自中国微博的新登录情绪词的方法。首先,考虑到一些无意义信号,例如网页链路和标点符号,首先,微博数据是预处理的。基于预处理数据,通过应用N-GRAM分段方法获得候选字符串。然后,提出了新登录词的提取算法,其组合了多字符互信息(MMI)和左右相邻熵。在该算法中,MMI描述了各种构成模式中多个单词的候选串的内部凝聚。然后,根据频率,MMI和右邻接和左邻接熵扩展和过滤候选字符串,以提取新的登录词。最后,提出了用于提取新的登录情绪词语的算法。在该算法中,确定单词(SW)之间的情感相似性,以便将新登录词的情绪相似性测量到其他情绪单词和其他新的登录情绪单词。然后,通过计算SW来提取新的登录情绪词来获得新登录词的情绪倾向值。实验结果表明,这种方法对于提取新的登录情绪词汇非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号