An efficient density-based clustering for multi-dimensional database

机译：高效的基于密度的多维数据库聚类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary shapes. However, when dealing with large-scale and multi-dimensional data, there is still a need for an efficient and versatile clustering method to identify these arbitrary shapes that may be embedded in these multi-dimensional space. In this paper, we propose a density-based clustering algorithm that adopts a divide-and-conquer strategy. To handle large-scale and multi-dimensional data, we first divide the data by grid cells. It is very efficient in large-scale cases where other algorithms often fail. Moreover, rather than tuning the grid cell width, we present a way to automatically determine the grid cell width. Then, we propose a flood-filling like algorithm to identify the clusters with arbitrary shapes over these grid cells. Finally, extensive experiments are conducted in both synthetic databases and real-world databases, showing that the proposed algorithm efficiently finds accurate clusters in both low-dimensional and multi-dimensional databases.

机译：聚类分析旨在根据数据元素的相似性将其分为不同的类别。这是数据挖掘中的一项常见任务，在模式识别，机器学习，信息检索等各个领域都非常有用。作为一个广泛的研究领域，文献中提出了许多聚类方法。其中，一些方法专注于挖掘具有任意形状的集群。然而，当处理大规模和多维数据时，仍然需要一种有效且通用的聚类方法来识别可能嵌入在这些多维空间中的这些任意形状。在本文中，我们提出了一种采用分而治之策略的基于密度的聚类算法。为了处理大规模和多维数据，我们首先将数据除以网格单元。在其他算法经常失败的大规模情况下，它非常有效。此外，我们提供了一种自动确定网格单元格宽度的方法，而不是调整网格单元格宽度。然后，我们提出了一种类似洪水填充的算法，以识别这些网格单元上具有任意形状的聚类。最后，在合成数据库和真实世界数据库中都进行了广泛的实验，结果表明，该算法可以有效地在低维和多维数据库中找到准确的聚类。

著录项

来源
《2017 International Conference on Information, Cybernetics, and Computational Social Systems》|2017年|361-366|共6页
会议地点 Dalian(CN)
作者
Lieliang Zhang; Zhiyang Li; Weijiang Liu; Wenyu Qu; Yinan Wu;
展开▼
作者单位

College of Information Science and Technology, Dalian Maritime University, Dalian, China;

College of Information Science and Technology, Dalian Maritime University, Dalian, China;

College of Information Science and Technology, Dalian Maritime University, Dalian, China;

School of Computer Software, Tianjin University, Tianjin, China;

87 Unit, 91550 of PLA, Dalian, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Databases; Data mining; Algorithm design and analysis; Partitioning algorithms; Clustering methods; Shape;

机译：聚类算法;数据库;数据挖掘;算法设计与分析;分区算法;聚类方法;形状;;

相似文献

外文文献
中文文献
专利

1. An efficient automated incremental density-based algorithm for clustering and classification [J] . Elham Azhir, Nima Jafari Navimipour, Mehdi Hosseinzadeh, Future generation computer systems . 2021,第Jana期

机译：基于群集和分类的高效自动增量密度算法
2. FEM-DBSCAN: An Efficient Density-Based Clustering Approach [J] . Kazemi Uranus, Boostani Reza Iranian Journal of Science and Technology, Transactions of Electrical Engineering . 2021,第3期

机译：FEM-DBSCAN：基于有效的基于密度的聚类方法
3. An efficient density-based clustering algorithm with circle-filtering strategy [J] . Xiao Xu International Journal of Collaborative Intelligence . 2020,第2期

机译：基于圆滤波策略的高效密度的聚类算法
4. An efficient density-based clustering for multi-dimensional database [C] . Lieliang Zhang, Zhiyang Li, Weijiang Liu, International Conference on Information, Cybernetics and Computational Social Systems . 2017

机译：用于多维数据库的基于高效的基于密度的聚类
5. Efficient declustering and indexing techniques for temporal databases and information retrieval. [D] . Behl, Sanjiv. 2002

机译：用于时态数据库和信息检索的高效解聚和索引技术。
6. A Novel Fundus Image Reading Tool for Efficient Generation of a Multi-dimensional Categorical Image Database for Machine Learning Algorithm Training [O] . Sang Jun Park, Joo Young Shin, Sangkeun Kim, 2018

机译：一种新型的眼底图像阅读工具可有效生成用于机器学习算法训练的多维分类图像数据库
7. Efficient density-based methods for knowledge discovery in databases [O] . Krieger Ralph 2008

机译：基于密度的高效知识发现方法

An efficient density-based clustering for multi-dimensional database

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅