首页> 中文期刊> 《计算机应用与软件》 >一种消除孤立点的微博热点话题发现方法

一种消除孤立点的微博热点话题发现方法

     

摘要

Microblogging has the characteristics of large number,fewer words and wide range of topics,these lead to quite a few isolated points (outliers)in microblogging data which have adverse effect on clustering algorithm of microblogging hot topics.Therefore,we propose a microblogging topic discovery method which is based on outliers elimination.First,the outliers are removed from dataset,and then the CURE algorithm is used to cluster those data remained and having clustering value,finally the validity of the algorithm is verified by examples. Results show that,compared with contrastive clustering algorithm,the proposed algorithm reduces the sensitivity of clustering result on outliers,improves the accuracy of microblogging hot topics discovery,and raises the operation efficiency of the algorithm,it is more suitable for applying in large-scale microblogging hot topics discovery.%微博具有数量多、字数少、话题广泛等特点,导致数据中孤立点较多,对微博热点话题聚类算法产生不利影响,为此,提出一种消除孤立点的微博热点话题发现方法。首先消除数据集中的孤立点,然后采用CURE(Clustering Using Representatives)算法对剩余有聚类价值的数据进行聚类,最后通过实例验证算法的有效性。结果表明,相对于对比聚类算法,该算法降低聚类结果对孤立点的敏感度,提高了微博热点话题发现的准确性,并提高了算法的运行效率,更适合应用于大规模的微博热点话题发现。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号