首页> 外文会议>2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11 >Clustering Algorithm for High Dimensional Data Stream over Sliding Windows
【24h】

Clustering Algorithm for High Dimensional Data Stream over Sliding Windows

机译:滑动窗口上高维数据流的聚类算法

获取原文

摘要

Data stream clustering is confronted with great challenges due to the memory usages and the processing speed. Besides, lots of stream data are high-dimensional in natural and high-dimensional data are inherently more complex in clustering. This paper proposes an effective clustering algorithm referred as HSWStream for high dimensional data stream over sliding windows. This algorithm handles the high dimensional problem with projected clustering technique, deals with the in-cluster evolution with exponential histogram of cluster feature called EHCF and eliminates the influence of old points with the fading temporal cluster features. Meanwhile, via the mechanism of exponential histogram, we save more information of recent data but less information of old data, which is fit for the thought of data stream evolution. The projected clustering brings higher quality of clusters and higher speed of execution, while the sliding window brings higher quality and less memory usage. In addition, in order to bring more efficiency, we use a fast computational method to maintain EHCF. Main idea of the fast computational method indicates that we have no need to handle the new data point immediately until we should delete a FTCF in corresponding EHCF. The evolving data streams in the experiments use KDD-CUP'98 and KDD-CUP'99 real data sets and synthetic data sets. The experimental results demonstrate that proposed method is of higher quality, less memory and faster processing speed than other algorithms.
机译:由于内存的使用和处理速度,数据流群集面临着巨大的挑战。此外,许多流数据在自然情况下是高维的,而高维数据在聚类中本来就更复杂。针对滑动窗口上的高维数据流,本文提出了一种称为HSWStream的有效聚类算法。该算法采用投影聚类技术处理高维问题,利用称为EHCF的聚类特征的指数直方图处理聚类内演化,并消除了随时间消逝的聚类特征对旧点的影响。同时,通过指数直方图的机制,我们保存了更多的最新数据信息,却保存了较少的旧数据信息,这符合数据流演进的思想。预计的集群带来了更高的集群质量和更高的执行速度,而滑动窗口带来了更高的质量和更少的内存使用。此外,为了提高效率,我们使用快速计算方法来维护EHCF。快速计算方法的主要思想表明,在删除相应EHCF中的FTCF之前,我们无需立即处理新数据点。实验中不断发展的数据流使用KDD-CUP'98和KDD-CUP'99真实数据集和合成数据集。实验结果表明,与其他算法相比,该方法具有更高的质量,更少的存储空间和更快的处理速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号