首页> 中文期刊> 《科学技术与工程》 >云计算环境下关联性大数据实时流式可控聚类算法

云计算环境下关联性大数据实时流式可控聚类算法

             

摘要

In view of the disadvantages of traditional clustering algorithms,such as low efficiency,poor efficien-cy and weak stability, a new real-time clustering algorithm for real-time clustering of large data streams in cloud computing environment was proposed. The definition and characteristics of association real-time streaming data were introduced. Through rough clustering preprocessing corresponding to arrive in time for data tuples,the class number of clusters and the center point,form set formed by different macro cluster,rough clustering using the algorithm for the Canopy algorithm were determined. The macro cluster obtained from the coarse clustering was transmitted to the K-means algorithm,and the detailed steps of the K-means algorithm were given. The fine clustering was completed by K-means algorithm,and the detailed steps of the fine clustering were introduced. The experimental results show that the proposed algorithm has the advantages of high efficiency,good quality and strong stability,and can effec-tively realize the association of real-time streaming large data clustering in cloud computing environment.%针对传统聚类算法效率低、效果差和稳定性弱等弊端,提出一种新的云计算环境下关联性大数据实时流式可控聚类算法.介绍了关联性实时流式数据的定义和特点.通过粗聚类对实时抵达的数据元组进行相应的预处理,确定类簇的数量与中心点位置,形成通过存在差异的宏簇构成的集合,粗聚类采用的算法为Canopy算法.将粗聚类得到的宏簇传至K-means算法,给出了K-means算法的详细步骤,通过K-means算法完成细聚类,介绍了整个细聚类详细步骤.实验结果表明,所提算法具有效率高、质量好、稳定性强等优势,可有效实现云计算环境下关联性实时流式大数据聚类.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号