首页> 中文期刊>计算机技术与发展 >网络舆情监控中新词识别问题的研究

网络舆情监控中新词识别问题的研究

     

摘要

With rapid development and deepen evolution of internet public opinion in the internet,a variety of new vocabulary and new string comes out due to the sudden of matters and the high frequence of Mew words occur on network, therefore, the current method of sub -dictionary has no effect on them in a large extent. The most important and most deadly is that those rare appear strings are divided into scattered fragments by the existing segmentation system, which will greatly affect the accuracy in extracting out the hot words and the keywords. Know that the situation will become the bottleneck of improving performance in network monitoring system. It analyzes the major advantages and disadvantages of several word segmentation and draw out the characteristics,using the local high-frequency of the keyword not included into dictionary in the monitoring public opinion,then calculating the anomalous bond between the abnormal words and its around words,finally,to identify the keywords not edit. The experiment shows:compared to the traditional segmentation algorithm, this segmentation algorithm can identify the keywords better and is more suitable for network monitoring public opinion.%在网络舆情监控中,由于事件的突发性和网络词汇的泛滥,各种各样的新兴词汇以及新的字符串大量涌现,而有穷的分词词典对新词的识别基本上无能为力,这些无法识别的字符串将被现有的分词系统分为零散的碎片,这将极大地影响热点词和主题词提取的准确性,成为网络舆情监控系统性能提升的瓶颈.文中分析了当前主要的几种分词技术的优 缺点,利用网络舆情监控中未被词典收录的主题词的局部高频这一特性,通过计算异常分词与周围分词之间的粘结度,从而识别出未被词典收录的主题词.实验结果表明:所提出的分词算法能识别出未被词典收录的主题词,相比传统的分词算法,更加适合于网络舆情监控.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号