首页> 外文期刊>Pattern recognition letters >A highly scalable clustering scheme using boundary information
【24h】

A highly scalable clustering scheme using boundary information

机译:使用边界信息的高度可扩展的聚类方案

获取原文
获取原文并翻译 | 示例

摘要

Many advanced clustering techniques are effective in dealing datasets in complicated situations. However, when facing large datasets, which are increasingly common in the era of big data, the time requirements of most existing techniques can quickly become intolerable. To tackle this challenge, in this paper, we propose Scalable Clustering Using Boundary Information (SCUBI), a highly flexible and scalable clustering scheme. The idea of SCUBI is to identify the boundary points of the original dataset in the first place and then group boundary points into suitable clusters using existing clustering techniques. Finally, the rest points are assigned to the same cluster as their nearest boundary points. To demonstrate the effectiveness and scalability of SCUBI, we plug the well-known DBSCAN algorithm into SCUBI. Comprehensive experiments are conducted using datasets with up to two million data points to compare the clustering results and time efficiency between DBSCAN and SCUBI-DBSCAN. Experimental results show that our method can obtain almost identical clustering results as the standard DBSCAN while achieving orders of magnitude speedup especially on large datasets, which confirms the scalability of SCUBI. Experiments are also performed on other clustering algorithms with high time complexity to verify the flexibility of SCUBI. (C) 2017 Elsevier B.V. All rights reserved.
机译:许多先进的聚类技术可有效处理复杂情况下的数据集。但是,当面对大型数据集时(这在大数据时代越来越普遍),大多数现有技术的时间要求很快就会变得无法忍受。为了解决这一挑战,在本文中,我们提出了一种使用边界信息的可伸缩群集(SCUBI),一种高度灵活且可伸缩的群集方案。 SCUBI的想法是首先确定原始数据集的边界点,然后使用现有的聚类技术将边界点分组为合适的聚类。最后,将其余点分配给与其最近的边界点相同的群集。为了证明SCUBI的有效性和可伸缩性,我们将著名的DBSCAN算法插入SCUBI。使用具有多达200万个数据点的数据集进行了全面的实验,以比较DBSCAN和SCUBI-DBSCAN之间的聚类结果和时间效率。实验结果表明,我们的方法可以获得与标准DBSCAN几乎相同的聚类结果,同时实现了数量级的加速,尤其是在大型数据集上,这证实了SCUBI的可伸缩性。还对具有高时间复杂度的其他聚类算法进行了实验,以验证SCUBI的灵活性。 (C)2017 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Pattern recognition letters》 |2017年第1期|1-7|共7页
  • 作者

    Tong Qiuhui; Li Xiu; Yuan Bo;

  • 作者单位

    Tsinghua Univ, Grad Sch Shenzhen, Intelligent Comp Lab, Div Informat, Shenzhen 518055, Peoples R China;

    Tsinghua Univ, Grad Sch Shenzhen, Intelligent Comp Lab, Div Informat, Shenzhen 518055, Peoples R China;

    Tsinghua Univ, Grad Sch Shenzhen, Intelligent Comp Lab, Div Informat, Shenzhen 518055, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Clustering; DBSCAN; Cluster boundary; Density gradient;

    机译:聚类;DBSCAN;聚类边界;密度梯度;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号