首页> 中文期刊> 《电信科学》 >特征词选择与相似度融合的微博话题发现方法

特征词选择与相似度融合的微博话题发现方法

         

摘要

Some words existing in micro-blog short text have a bad effect on the accuracy of text similarity calculation,further affecting the quality of topic discovery.And these words are the same in shape or semantic meaning,but remote from the topic.A novel method of feature words selection based on micro-blog short text content and structured information was proposed,which could effectively choose some important feature words from the text.Moreover,in computing the similarity between texts,an improvement on computing the similarity between the text and the topic was made.Finally,the methods were combined together and applied to discover micro-blog topics.Experimental results show that the new method of topic discovery can effectively reduce the average missing rate and false detection rate,and improve the quality of topic discovery.%微博短文本中存在一些相同或相近、但与主题关系不大的词项,对准确度量文本之间的相似性具有较大的干扰作用,影响微博话题被发现的质量.提出一种基于文本内容与结构化信息相结合的特征词选择算法,能有效提取具有代表性的特征词,并对文本、话题间相似度的计算策略进行改进,然后将特征词选择算法与相似度计算方法融合,应用于微博文本数据实现话题发现.实验结果表明,本算法能有效降低话题发现的平均漏检率与误检率,提高话题发现质量.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号