EDDS: An Enhanced Density-based Method for Clustering Data Streams

机译：EDDS：用于聚类数据流的基于增强的基于密度的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data stream clustering is an active area of research in big data. It refers to clustering constantly arriving new data records and updating existing cluster patterns and outliers in light of the newly arriving data. Density-based algorithms for solving this problem have the promise for finding arbitrary shape clusters and detecting anomalies without prior knowledge of the number of clusters. In this paper, a new incremental algorithm known as Enhanced Density-based Data Stream (EDDS) is developed to overcome limitations with the existing solutions. The algorithm detects clusters and outliers in an incoming data chunk, merges new clusters from the chunk with the existing clusters, and filters out new outliers for the next round. It modified the traditional DBSCAN algorithm to summarise each cluster in terms of a set of surface-core points. The algorithm applies the density-reachable concept of DBSCAN as its merging strategy and prunes the internal core points using a heuristic solution. The algorithm also removes the aged core points and outliers depending on a fading function. The paper investigates three versions of the algorithm for three possible representations of clusters where either all core points are maintained (EDDS-I), only core points of the new clusters from the incoming chunk are kept (EDDS-II), or only the surface-core points of the cluster shapes are kept (EDDS-III) to examine the balance between the efficiency gain for the algorithm and the amount of overhead time committed for pruning internal core points. The algorithm was evaluated on selected datasets using various quality measures. The experimental results indicate improvements in terms of clustering correctness with a comparable time complexity over the existing solutions for solving the same kind of problems.

机译：数据流群集是大数据中的一个活动区域。它是指群集不断到达新数据记录以及根据新到达数据更新现有的群集模式和异常值。用于解决该问题的基于密度的算法具有寻找任意形状群集和检测异常而无需先验知识的群集。在本文中，开发了一种称为增强基于密度的数据流（EDDS）的新增算法以克服与现有解决方案的限制。该算法在传入数据块中检测到群集和异常值，将新的群集与现有群集的块合并，并为下一轮筛选出新的异常值。它修改了传统的DBSCAN算法，以一组表面核心点来汇总每个群集。该算法将DBSCAN的密度可达概念应用于其合并策略，并使用启发式解决方案修剪内核点。根据衰落功能，该算法还取消了老化的核心点和异常值。本文研究了三个版本的三个算法的三种可能的群集表示，其中所有核心点（EDDS-I），只有来自传入块的新集群的核心点被保存（EDDS-II），或者只有表面 - 群集形状的点保持（EDDS-III），以检查算法的效率增益之间的平衡和承诺用于修剪内核点的开销时间。使用各种质量措施对所选数据集进行评估该算法。实验结果表明，在对现有解决方案上进行了相当的时间复杂性，可以改善聚类正确性，以解决同一问题。

著录项

来源
《International Workshop on Embedded Multicore Systems》|2017年|320p|共10页
会议地点
作者
Ammar Al Abd Alazeez; Sabah Jassim; Hongbo Du;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2-53;
关键词
Big data; Data stream clustering; Outlier's detection; Density-based approaches; DBSCAN;

机译：大数据;数据流聚类;异常值的检测;基于密度的方法;DBSCAN;

相似文献

外文文献
中文文献
专利

1. Density-Based Clustering Method for Trends Analysis Using Evolving Data Stream [J] . Umesh Kokate, Arviand V.Deshpande, Parikshit N.Mahalle International journal of synthetic emotions . 2020,第2期

机译：基于趋势分析的基于密度的聚类方法使用不断的数据流
2. A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data [J] . Chen Jin-Yin, He Hui-Hao Information Sciences: An International Journal . 2016,第Null期

机译：针对混合数据自行确定簇中心的基于密度的快速数据流聚类算法
3. LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream [J] . Amineh Amini, Teh Ying Wah Journal of Computer and Communications . 2013,第5期

机译：LeaDen-Stream：不断发展的数据流上基于领导者密度的聚类算法
4. EDDS: An Enhanced Density-based Method for Clustering Data Streams [C] . Ammar Al Abd Alazeez, Sabah Jassim, Hongbo Du International Workshop on Embedded Multicore Systems . 2017

机译：EDDS：用于聚类数据流的基于增强的基于密度的方法
5. Image reconstruction of muon tomographic data using a density-based clustering method. [D] . Perry, Kimberly B. 2015

机译：使用基于密度的聚类方法对μ子层析成像数据进行图像重建。
6. SOTXTSTREAM: Density-based self-organizing clustering of text streams [O] . Avory C. Bryant, Krzysztof J. Cios 2011

机译：SOTXTSTREAM：基于密度的文本流自组织群集
7. On Density-based Clustering Algorithms over Evolving Data Streams: A Summarization Paradigm [O] . Amineh Amini, Teh Ying Wah 2016

机译：基于密度的数据流演化聚类算法：概述范式

EDDS: An Enhanced Density-based Method for Clustering Data Streams

摘要

著录项

相似文献

相关主题

期刊订阅