首页> 外文会议>ACM SIGMOD International Conference on Management of Data >OPTICS: Ordering Points To Identify the Clustering Structure
【24h】

OPTICS: Ordering Points To Identify the Clustering Structure

机译:光学:订购点以标识聚类结构

获取原文

摘要

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for inter-active exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.
机译:群集分析是数据库挖掘的主要方法。它被用作独立工具,以了解数据集的分布,例如,实现数据集的分布。为了聚焦进一步的分析和数据处理,或者作为在检测到的群集中操作的其他算法的预处理步骤。几乎所有众所周知的聚类算法都需要输入的参数,这很难确定,但对聚类结果有显着影响。此外,对于许多实际数据集,甚至没有纳入全局参数设置,其中聚类算法的结果准确地描述了内部聚类结构。我们介绍了一种新的算法,用于集群分析,它不会明确产生数据集的群集;而是创建代表基于密度的群集结构的数据库的增强排序。此群集排序包含与对应于广泛参数设置的基于密度的群集等同的信息。这是自动和交互式集群分析的多功能基础。我们展示了如何自动和有效地提取“传统”聚类信息(例如代表点,任意形状的簇),也是内在聚类结构。对于中等大小的数据集,可以以图形方式表示群集排序,并且对于非常大的数据集,我们介绍了适当的可视化技术。两者都适用于内在聚类结构的互积极探索,该结构提供了额外的见解进入数据的分布和相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号