首页> 中文期刊> 《信息网络安全》 >敏感话题发现中的增量型文本聚类模型

敏感话题发现中的增量型文本聚类模型

         

摘要

Faced with the huge amounts of news data which updated on the Internet all the time, Sensitive Topic Detection and Tracking has become an important research now. In this paper, we discuss and research the incremental text clustering algorithm for sensitive topic detection in a online consensus analysis system. We introduce the related work of text clustering. Based on the Single-pass algorithm, we improve its performance and propose a new incremental text clustering algorithm which based on simhash. Based on the real online news corpus from the online consensus analysis system, we conduct an experiment to test and verify the feasibility and effectiveness of the algorithm we proposed. The result shows that the new algorithm is much more efficient compared to the original Single-pass clustering algorithm. In the real application, the new incremental text clustering algorithm basically meet the real-time demand of online topic detection and has a certain practical value.%面对网络上更新快速的海量新闻,如何快速、有效地从中自动发现敏感话题并进行持续跟踪是当下研究的热点.文章以网络舆情分析系统为应用背景,针对其敏感话题发现过程,通过对TDT领域应用较多的Single-pass算法进行改进,提出了一种基于相似哈希的增量型文本聚类算法.基于实际应用中抓取到的新闻文本数据,实验结果表明,文章提出的算法相比于原Single-pass算法在聚类效率方面具有明显提升.从实际应用的效果来看,该算法达到了实时话题发现的预期需求,具有较高的实用价值.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号