首页> 中文期刊> 《计算机学报》 >面向热点话题时间序列的有效聚类算法研究

面向热点话题时间序列的有效聚类算法研究

         

摘要

聚类热度时间序列是揭示和建模网络热点话题形成与发展的重要过程.Leskovec等人在2010年提出面向话题时间序列的K_SC聚类算法,其精确度较高且能较好地刻画话题内在发展趋势特征.但K_SC算法具有对初始类矩阵中心高度敏感、高时间复杂度等特性,使其难以在实际高维大数据集上应用.文中结合小波变换技术,提出一个新的迭代式聚类算法WKSC,主要提出两个创新:(1)用Haar小波变换将原始时间序列进行压缩,降低原始时间序列的维度,从而降低了算法的时间复杂度;(2)在Haar反小波变换中,将低维聚类返回得到的矩阵中心作为高维聚类的初始矩阵中心,在迭代聚类过程中优化了对初始矩阵中心高敏感性的问题,提高了聚类的效果.文中分别采用国内外3个数据集作为测试样本,进行了大量的实验.实验结果表明WKSC算法能显著降低聚类的时间复杂度,同时改进聚类效果.WKSC算法可很好的应用于大量高维热点话题的模式分析.%Hot degree time series clustering is very important for revealing and modeling development process of hot topics in Web sites. In 2010, Leskovec and his colleagues proposed a K-Spectral Centroid (K_SC) time series clustering algorithm, which has higher accuracy and can be used to better describe the trend of hot topics. But K_SC algorithm is sensitive to the initialization of cluster centers and has high time complexity. Therefore, it is difficult to directly apply K_SC to high dimensional data. Based on wavelet transform technology, a new iteration clustering algorithm-WKSC is proposed in this paper, which has two improvements; (1) the original time series are compressed by Haar wavelets transform to lower dimensions of the original time series. WKSC algorithm groups topics based on lower dimensions time series and the time complexity is reduced; (2) the clustering results from previous iteration of K_SC are used as the initial assignment at the high level, then the high sensitivity to cluster centers is solved. Three datasets from different sources were selected and comprehensive experiments were conducted. Experimental results show that WKSC algorithm can significantly reduce time complexity, and improve the quality of clustering result, which means WKSC algorithm can be used on massive and high dimension hot topics.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号