首页> 外文学位 >Design and evaluation of clustering criterion for optimal hierarchical agglomerative clustering.
【24h】

Design and evaluation of clustering criterion for optimal hierarchical agglomerative clustering.

机译:最佳聚类聚类的聚类准则的设计和评估。

获取原文
获取原文并翻译 | 示例

摘要

Clustering techniques have been broadly used in many areas to retrieve meaningful data patterns hidden in unknown data structures. Even though more effective and efficient clustering algorithms have been recently developed, most of them still suffer from the problems associated with uncertainty of clustering optimality. This thesis aims to design a clustering criterion to resolve the main problem, i.e., uncertainty of clustering optimality. The criterion has been designed to work for hierarchical agglomerative clustering methods and help them find their own optimal clustering. Furthermore, it can be used to estimate the desired number of clusters for partitional clustering algorithms. In particular, use of the criterion does not depend on a particular clustering algorithm since a priori parameters are not required.; The criterion is based on the squared error method that has been widely used as an evaluation measure for clustering techniques. By using the traditional concept of entropy, we interpret clustering as a seeking process that discovers an optimal configuration at the minimum clustering entropy. The existence of clustering optimality has been proved in multidimensional Euclidean metric space using opposite concept of entropy, clustering gain. The minimum entropy implies the best tradeoff between two competing trends, i.e., intra-cluster and inter-cluster error sums. Optimal clustering can be achieved when the hierarchical agglomerative clustering algorithms stop building the dendrogram at the global minimum of clustering entropy. The number of desired clusters and initial centroids of the clusters can be estimated according to the best configuration among many optimal configurations, and they can be provided to non-hierarchical partitional clustering methods. Experimental results convincingly illustrate that the popular partitional clustering algorithms successfully converge to their optimal clustering configurations very quickly given the estimated number of clusters and initial centroids. In addition, a new weighting scheme with dimension-compression technique that improves retrieval effectiveness and classification performance is also presented. Therefore, our clustering criterion provides a promising technique for achieving higher level of quality for wide range of clustering techniques.
机译:群集技术已广泛用于许多领域,以检索隐藏在未知数据结构中的有意义的数据模式。即使最近开发了更有效的聚类算法,但是它们中的大多数仍然遭受与聚类最优性的不确定性相关的问题。本文旨在设计一种聚类准则来解决主要问题,即聚类最优性的不确定性。该标准旨在用于分层的聚类聚类方法,并帮助他们找到自己的最佳聚类。此外,它可用于估计分区聚类算法所需的聚类数量。特别地,由于不需要先验参数,因此准则的使用不依赖于特定的聚类算法。该标准基于平方误差方法,该方法已广泛用作聚类技术的评估指标。通过使用传统的熵概念,我们将聚类解释为在最小聚类熵下发现最佳配置的寻找过程。使用熵,聚类增益的相反概念证明了多维欧式度量空间中聚类最优性的存在。最小熵意味着两个竞争趋势之间的最佳权衡,即集群内和集群间误差之和。当分层聚集聚类算法停止在聚类熵的全局最小值处生成树状图时,可以实现最佳聚类。可以根据许多最佳配置中的最佳配置来估计所需群集的数目和群集的初始质心,并且可以将它们提供给非分层分区群集方法。实验结果令人信服地说明,在估计簇数和初始质心的情况下,流行的分区聚类算法可以非常迅速地成功收敛到其最佳聚类配置。此外,还提出了一种新的具有维数压缩技术的加权方案,可以提高检索效率和分类性能。因此,我们的聚类标准为广泛的聚类技术提供了一种实现更高质量水平的有前途的技术。

著录项

  • 作者

    Jung, Yunjae.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 108 p.
  • 总页数 108
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号