首页> 中文期刊> 《计算机工程与科学》 >基于改进K-means算法的微博舆情分析研究

基于改进K-means算法的微博舆情分析研究

         

摘要

In order to avoid selecting isolated points as the initial clustering center which can cause clustering results to fall into local optimum,we propose a new K-means (clustering algorithm) initial clustering center selection method based on density.This algorithm firstly calculates the average similarity between each data object and the others,and finds the core objects whose average similarities are higher than a fixed threshold.The least similar core object to each other is taken as the initial clustering center.We build a crawler for Sina Microblog to grab thousands of different types of data.After dividing words,pretreatment and weight calculation,we use the improved K-means algorithm for clustering analysis.Compared with the traditional K-means algorithm,our proposal has a more stable precision/full ratio,and the average clustering time is also shortened.Experimental results show that the improved algorithm has higher accuracy and better stability in microblog clustering,and can be used in discovering public opinion from a large number of microblog data.%为避免初始聚类中心选取到孤立点容易导致聚类结果陷入局部最优的不足,提出一种基于密度的K-means(聚类算法)初始聚类中心选择方法.该方法首先计算每个数据对象与其它数据对象间的平均相似度,找出平均相似度高于某固定阈值的对象视作核心对象,再从核心对象中选取彼此间最不相似的作为初始聚类中心.通过自构建的新浪微博抓取工具,分别抓取不同类别的数千条数据,经过分词、预处理及权重计算后,用改进的K-means算法对其进行聚类分析,查准/全率较传统的K-means算法要稳定,聚类的平均时间也得到缩短.实验结果表明,改进后的算法在微博聚类中有更高的准确性和稳定性,有利于从大量的微博数据中发现热点舆情.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号