首页> 中文期刊> 《网络与信息安全学报》 >结合时序和语义的中文微博话题检测与跟踪方法

结合时序和语义的中文微博话题检测与跟踪方法

         

摘要

As a widely used tool in social networks, microblog is definitely with short document, quick broadcasting and topic changeable, which results in big challenging for social topic detection and tracking. A new systematic framework for micro-blog topic detection and tracking was proposed based on the microblog clustering using tem-poral trend and semantic similarity. Firstly, a feature words selection method for hot topics was presented by defin-ing the temporal frequent words set. Secondly, an initially clustering was conducted depending on the selected tem-poral frequent words set. As far as the overlaps between initial clusters concerned, an effective overlap elimination algorithm was proposed, by introducing the extended short document semantic membership, to separate any possible overlapped initial clusters. Finally, an aggregated topic clustering method was employed using the cluster semantic similarity matrix. The experiments were at last done on some real-world dataset from Sina microblog. It show that the method for chinese microblog topic detection and tracking can obtain excellent performance and results.%微博文本具有短小快捷、主题多变等特点,社交话题检测与跟踪研究面临新的挑战。结合微博的话题时序性和短文本语义相似度等特点,提出了基于微博聚类的话题检测与跟踪系统方法。首先,通过定义微博文本的时序频繁词集,给出面向热点话题的特征词选择方法;然后,根据时序频繁特征词集,利用最大频繁项集获得微博初始聚类;针对初始簇间存在文本重叠情况,提出基于短文本扩展语义隶属度的簇间重叠消减算法,获得完全分离的初始簇;最后,根据簇语义相似度矩阵,给出凝聚式话题聚类方法。通过新浪微博完成实验测试,表明所提方法可用于中文微博热点话题检测与跟踪。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号