【24h】

OPTICS

机译:光学

获取原文

摘要

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

机译:

集群分析是数据库挖掘的主要方法。它既可以用作独立工具,也可以深入了解数据集的分布,例如集中于进一步的分析和数据处理,或作为在检测到的簇上运行的其他算法的预处理步骤。几乎所有众所周知的聚类算法都需要输入参数,这些参数很难确定,但会对聚类结果产生重大影响。此外,对于许多实际数据集,甚至不存在全局参数设置,对于该参数设置,聚类算法的结果准确地描述了固有聚类结构。出于聚类分析的目的,我们引入了一种新算法, not 不会显式地生成数据集的聚类;而是创建表示其基于密度的聚类结构的数据库增强型排序。该聚类排序包含的信息等效于与各种参数设置相对应的基于密度的聚类。它是自动和交互式聚类分析的通用基础。我们展示了如何不仅自动且有效地提取``传统''聚类信息(例如代表点,任意形状的聚类),还包括内在聚类结构。对于中等大小的数据集,可以用图形表示簇的排序,对于非常大的数据集,我们引入一种合适的可视化技术。两者都适合用于内在聚类结构的交互式探索,从而为数据的分布和相关性提供了更多的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号