首页> 中文期刊> 《计算机技术与发展》 >分布式数据流聚类算法及其基于Storm的实现

分布式数据流聚类算法及其基于Storm的实现

             

摘要

为了提高数据流聚类算法的效率,设计并提出了基于质心距离和密度网格的数据流聚类算法-CDD-Stream,并通过对其中网格结构的更新实施了并行化策略,进而设计并提出了一种分布式数据流聚类算法-DCD-Stream(Distributed Centroid Distance D-Stream).该算法分为在线和离线两个部分,在线部分实时接收数据流,利用局部节点和全局节点实现了网格结构更新的并行化,完成了整体网格结构的增量更新;离线部分基于网格结构的更新结果进行全局聚类,并存储网格帧,供用户查询历史簇.充分利用Storm快速实时处理数据流并显著提高数据流挖掘算法性能的优势,设计并实现了基于Storm的DCD-Stream算法实现方案.该方案通过内存数据库Redis和消息中间件Kafka的应用对DCD-Stream算法的拓扑进行了合理部署与实现.对比验证实验结果表明,相对于其他算法,DCD-Stream算法在数据流对象上有相当高的聚类精度和更好的时效性,基于Storm的DCD-Stream算法实现方案是可行且有效的.%In order to improve the efficiency of data stream clustering algorithm,a data stream clustering algorithm based on centroid distance and density grid (named as CDD-Stream) has been designed and proposed,and a distributed data stream clustering algorithm DCD-Stream (Distributed Centroid Distance D-Stream) has been designed and proposed through adopting the parallelization strategy of updating grids into CDD-Stream algorithm.The algorithm has been divided into on-line part and off-line part.The online part is responsible for receiving data streams in real time and realizing the parallel updating of the grid structures by using local and global nodes.The off-line part finishes global clustering based on the updated results of grids,and stores grid frames which allows user to query the historical clusters.By making full use of Storm's fast real-time processing of data stream and improving the performance of data stream mining algorithm significantly,a scheme of implementing DCD-Stream algorithm on Storm platform has been designed and implemented.It uses memory database Redis and messaging middleware Kafka to deploy and realize the topology of DCD-Stream algorithm reasonably.The experimental results have shown that compared with other algorithm,DCD-Stream algorithm has considerable clustering quality and better clustering timeliness on data stream objects,and it is practical and effective for implementing DCD-Stream algorithm based on Storm.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号