首页> 中文期刊>计算机技术与发展 >一种基于滑动窗口的流数据聚类算法

一种基于滑动窗口的流数据聚类算法

     

摘要

在实际应用中,人们往往比较关心最近一段时间内数据流的分布状况.在传统的基于界标模型的聚类算法CluStream中,没有淘汰过期元组,不能准确反映当前数据流的数据分布状况.滑动窗口是数据流中一种关注近期数据的近似方法.为了提高对流数据聚类分析的质量及效率,对算法clustream进行了改进,采用滑动窗口来支持数据处理.为了减少聚类操作中每次迭代的计算次数,算法采用改进的k-means来执行聚类操作.优化后的算法能及时淘汰过期元组,同时对新到达的元组不断进行实时处理,可以获得更准确的分析结果.与聚类算法CluStream相比,优化算法可获得较小的内存开销和快速的数据处理能力,聚类结果更合理清晰.%Data stream in the most recent distribution of the more often a cause for concern.CluStream algorithm is a traditional landmark -based model of the clustering algorithm which does not eliminate expired tuples.We cannot accurately reflect the current data distribution of the data stream.Sliding window is an approximate method which is concerned about the recent data in the data stream.In order to improve the quality and efficiency of the analysis of data stream clustering, have proposed an improved algorithm on the base of CluStream algorithm in this paper.Sliding window is used to support the data processing.In order to reduce the number of the calculation in the clustering operation, the algorithm use improved k-means clustering to perform the operation.The optimized algorithm can eliminate the expired tuples in time, while the new arrived tuples can be processed in real time.Through this way, can obtain a more accurate analysis result.Compared with clustering algorithm CluStream, optimization algorithm can obtain less memory overhead and faster dataprocessing capacity.So that, the outcome of clustering analysis can become much more reasonable and clear.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号