首页> 外文会议>2017 International Conference on Information, Cybernetics, and Computational Social Systems >An efficient density-based clustering for multi-dimensional database
【24h】

An efficient density-based clustering for multi-dimensional database

机译:高效的基于密度的多维数据库聚类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary shapes. However, when dealing with large-scale and multi-dimensional data, there is still a need for an efficient and versatile clustering method to identify these arbitrary shapes that may be embedded in these multi-dimensional space. In this paper, we propose a density-based clustering algorithm that adopts a divide-and-conquer strategy. To handle large-scale and multi-dimensional data, we first divide the data by grid cells. It is very efficient in large-scale cases where other algorithms often fail. Moreover, rather than tuning the grid cell width, we present a way to automatically determine the grid cell width. Then, we propose a flood-filling like algorithm to identify the clusters with arbitrary shapes over these grid cells. Finally, extensive experiments are conducted in both synthetic databases and real-world databases, showing that the proposed algorithm efficiently finds accurate clusters in both low-dimensional and multi-dimensional databases.
机译:聚类分析旨在根据数据元素的相似性将其分为不同的类别。这是数据挖掘中的一项常见任务,在模式识别,机器学习,信息检索等各个领域都非常有用。作为一个广泛的研究领域,文献中提出了许多聚类方法。其中,一些方法专注于挖掘具有任意形状的集群。然而,当处理大规模和多维数据时,仍然需要一种有效且通用的聚类方法来识别可能嵌入在这些多维空间中的这些任意形状。在本文中,我们提出了一种采用分而治之策略的基于密度的聚类算法。为了处理大规模和多维数据,我们首先将数据除以网格单元。在其他算法经常失败的大规模情况下,它非常有效。此外,我们提供了一种自动确定网格单元格宽度的方法,而不是调整网格单元格宽度。然后,我们提出了一种类似洪水填充的算法,以识别这些网格单元上具有任意形状的聚类。最后,在合成数据库和真实世界数据库中都进行了广泛的实验,结果表明,该算法可以有效地在低维和多维数据库中找到准确的聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号