Clustering Algorithm for High Dimensional Data Stream over Sliding Windows

机译：滑动窗口上高维数据流的聚类算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data stream clustering is confronted with great challenges due to the memory usages and the processing speed. Besides, lots of stream data are high-dimensional in natural and high-dimensional data are inherently more complex in clustering. This paper proposes an effective clustering algorithm referred as HSWStream for high dimensional data stream over sliding windows. This algorithm handles the high dimensional problem with projected clustering technique, deals with the in-cluster evolution with exponential histogram of cluster feature called EHCF and eliminates the influence of old points with the fading temporal cluster features. Meanwhile, via the mechanism of exponential histogram, we save more information of recent data but less information of old data, which is fit for the thought of data stream evolution. The projected clustering brings higher quality of clusters and higher speed of execution, while the sliding window brings higher quality and less memory usage. In addition, in order to bring more efficiency, we use a fast computational method to maintain EHCF. Main idea of the fast computational method indicates that we have no need to handle the new data point immediately until we should delete a FTCF in corresponding EHCF. The evolving data streams in the experiments use KDD-CUP'98 and KDD-CUP'99 real data sets and synthetic data sets. The experimental results demonstrate that proposed method is of higher quality, less memory and faster processing speed than other algorithms.

机译：由于内存的使用和处理速度，数据流群集面临着巨大的挑战。此外，许多流数据在自然情况下是高维的，而高维数据在聚类中本来就更复杂。针对滑动窗口上的高维数据流，本文提出了一种称为HSWStream的有效聚类算法。该算法采用投影聚类技术处理高维问题，利用称为EHCF的聚类特征的指数直方图处理聚类内演化，并消除了随时间消逝的聚类特征对旧点的影响。同时，通过指数直方图的机制，我们保存了更多的最新数据信息，却保存了较少的旧数据信息，这符合数据流演进的思想。预计的集群带来了更高的集群质量和更高的执行速度，而滑动窗口带来了更高的质量和更少的内存使用。此外，为了提高效率，我们使用快速计算方法来维护EHCF。快速计算方法的主要思想表明，在删除相应EHCF中的FTCF之前，我们无需立即处理新数据点。实验中不断发展的数据流使用KDD-CUP'98和KDD-CUP'99真实数据集和合成数据集。实验结果表明，与其他算法相比，该方法具有更高的质量，更少的存储空间和更快的处理速度。

著录项

来源
《2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11》|2011年|p.1537-1542|共6页
会议地点
作者
Liu Weiguo; OuYang Jia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
clustering algorithm; data stream; exponential histogram; projected clustering; sliding window;

机译：聚类算法数据流指数直方图投影聚类滑动窗口;

相似文献

外文文献
中文文献
专利

1. Online algorithms for mining semi-structured data stream with sliding window and forgetting factor [J] . Tatsuya Asai, Hiroki Arimura, Kenji Abe, 電子情報通信学会技術研究報告. フォ-ルトトレラントシステム . 2002,第377期

机译：具有滑动窗口和遗忘因子的半结构化数据流在线挖掘算法
2. Online algorithms for mining semi-structured data stream with sliding window and forgetting factor [J] . Tatsuya Asai, Hiroki Arimura, Kenji Abe, 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2002,第375期

机译：具有滑动窗口和遗忘因子的半结构化数据流在线挖掘算法
3. Online algorithms for mining semi-structured data stream with sliding window and forgetting factor [J] . Tatsuya Asai, Hiroki Arimura, Kenji Abe, 電子情報通信学会技術研究報告. フォ-ルトトレラントシステム . 2002,第377期

机译：具有滑动窗口和遗忘因子的挖掘半结构数据流的在线算法
4. Clustering Algorithm for High Dimensional Data Stream over Sliding Windows [C] . Weiguo Liu, Jia OuYang International Conference on Trust, Security and Privacy in Computing and Communications . 2011

机译：滑动窗口高维数据流的聚类算法
5. Approximate Clustering Algorithms for High Dimensional Streaming and Distributed Data [D] . Carraher, Lee A. 2018

机译：高维流和分布式数据的近似聚类算法
6. Reducing False Negative Reads in RFID Data Streams Using an Adaptive Sliding-Window Approach [O] . Libe Valentine Massawe, Johnson D. M. Kinyua, Herman Vermaak 2012

机译：使用自适应滑动窗口方法减少RFID数据流中的假阴性读取
7. An EM-Based Algorithm for Clustering Data Streams in Sliding Windows [O] . Xuan Hong Dang, Vincent Lee, Arridhana Ciptadi, 2011

机译：基于EM的滑动Windows数据流聚类算法

Clustering Algorithm for High Dimensional Data Stream over Sliding Windows

摘要

著录项

相似文献

相关主题

期刊订阅