首页> 外文会议>IEEE International Conference on Data Engineering >DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces
【24h】

DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces

机译:DBSCAN-MS:度量空间中基于密度的分布式聚类

获取原文

摘要

DBSCAN is one of important density-based clustering methods, which has a wide range of applications in machine learning and data mining, to name but a few. However, the rapid growing volume and variety of data nowadays challenges traditional DBSCAN, and thus, distributed DBSCAN in metric spaces is required. In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces. To ensure load balancing, we present a k-d tree based partitioning approach. It utilizes pivots to map the data in metric spaces to vector spaces, and employs k-d tree partitioning technique to equally divide the data. To avoid unnecessary computation and communication cost, we propose a framework that divides data into partitions, find out local DBSCAN result, and merge local result based on a merging graph. In addition, the pivot filtering and the sliding window techniques are also used in the framework for pruning. Extensive experiments with both real and synthetic datasets demonstrate the efficiency and scalability of our proposed DBSCAN-MS.
机译:DBSCAN是重要的基于密度的聚类方法之一,仅举几例,它在机器学习和数据挖掘中具有广泛的应用。但是,如今,数据量的快速增长和多样化挑战了传统的DBSCAN,因此,需要在度量空间中使用分布式DBSCAN。在本文中,我们提出了DBSCAN-MS,这是度量空间中基于密度的分布式聚类。为了确保负载平衡,我们提出了一种基于k-d树的分区方法。它利用枢轴将度量空间中的数据映射到向量空间,并采用k-d树划分技术来均等地划分数据。为了避免不必要的计算和通信成本,我们提出了一个框架,该框架将数据划分为多个分区,找出本地DBSCAN结果,并基于合并图合并本地结果。另外,在框架中还使用了枢轴过滤和滑动窗口技术进行修剪。真实和合成数据集的大量实验证明了我们提出的DBSCAN-MS的效率和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号