A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data

Chen Jin-Yin; He Hui-Hao

首页> 外文期刊>Information Sciences: An International Journal >A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data

【24h】

A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data

机译：针对混合数据自行确定簇中心的基于密度的快速数据流聚类算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most data streams encountered in real life are data objects with mixed numerical and categorical attributes. Currently most data stream algorithms have shortcomings including low clustering quality, difficulties in determining cluster centers, poor ability for dealing with outliers' issue. A fast density-based data stream clustering algorithm with cluster centers automatically determined in the initialization stage is proposed. Based on data attribute relationships analysis, mixed data sets are filed into three types whose corresponding distance measure metrics are designed. Based on field intensity-distance distribution graph for each data object, linear regression model and residuals analysis are used to find the outliers of the graph, enabling cluster centers automatic determination. After the cluster centers are found, all data objects can be clustered according to their distance with centers. The data stream clustering algorithm adopts an online/offline two-stage processing framework, and a new micro cluster characteristic vector to maintain the arriving data objects dynamically. Micro clusters decay function and deletion mechanism of micro clusters are used to maintain the micro clusters, which reflects the data stream evolution process accurately. Finally, the performances of the proposed algorithm are testified by a series of experiments on real-world mixed data sets in comparison with several outstanding clustering algorithms in terms of the clustering purity, efficiency and time complexity. (C) 2016 Elsevier Inc. All rights reserved.

机译：现实生活中遇到的大多数数据流都是具有混合数值和分类属性的数据对象。当前，大多数数据流算法都有缺点，包括聚类质量低，确定聚类中心困难，处理离群值问题的能力差。提出了一种在初始化阶段自动确定具有聚类中心的基于密度的快速数据聚类算法。基于数据属性关系分析，将混合数据集分为三种类型，分别设计了相应的距离度量标准。基于每个数据对象的场强-距离分布图，使用线性回归模型和残差分析来找到图的离群值，从而使聚类中心能够自动确定。找到聚类中心之后，可以根据所有数据对象与中心的距离对它们进行聚类。数据流聚类算法采用在线/离线两阶段处理框架，并采用新的微簇特征向量来动态维护到达的数据对象。利用微簇的衰变函数和微簇的删除机制来维护微簇，准确地反映了数据流的演进过程。最后，通过在真实世界混合数据集上进行的一系列实验证明了该算法的性能，并在聚类纯度，效率和时间复杂度方面与几种出色的聚类算法进行了比较。（C）2016 Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2016年第null期|共23页
作者
Chen Jin-Yin; He Hui-Hao;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Data mining; Mixed attributes; Data stream clustering; Peak field intensity; Mixed distance measure metrics;

机译：数据挖掘;混合属性;数据流聚类;峰值场强度;混合距离度量;

相似文献

外文文献
中文文献
专利

1. A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data [J] . Chen Jin-Yin, He Hui-Hao Information Sciences: An International Journal . 2016,第Null期

机译：针对混合数据自行确定簇中心的基于密度的快速数据流聚类算法
2. On Density-Based Data Streams Clustering Algorithms: A Survey [J] . Amineh Amini, Teh Ying Wah, Hadi Saboohi 计算机科学技术学报（英文版） . 2014,第001期

机译：基于密度的数据流聚类算法研究
3. Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams [J] . Fahy Conor, Yang Shengxiang, Gongora Mario Cybernetics, IEEE Transactions on . 2019,第6期

机译：蚁群流聚类：动态数据流的快速密度聚类算法
4. A Comparative Study of Density-based Clustering Algorithms on Data Streams: Micro-clustering Approaches [C] . Amineh Amini, Teh Ying Wah Intelligent control and innovative computing . 2011

机译：基于密度的数据流聚类算法比较研究：微聚类方法
5. Scalable frameworks and algorithms for cluster ensembles and clustering data streams. [D] . Hore, Prodip. 2007

机译：用于集群集成和集群数据流的可扩展框架和算法。
6. Fast Nonparametric Density-Based Clustering of Large Data Sets Using a Stochastic Approximation Mean-Shift Algorithm [O] . Ollivier Hyrien, Andrea Baran -1

机译：使用随机逼近均值漂移算法的大型数据集基于非参数密度的快速聚类
7. On Density-based Clustering Algorithms over Evolving Data Streams: A Summarization Paradigm [O] . Amineh Amini, Teh Ying Wah 2016

机译：基于密度的数据流演化聚类算法：概述范式

A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data

摘要

著录项

相似文献

相关主题

期刊订阅