首页> 美国卫生研究院文献>Ecology and Evolution >Silhouette width using generalized mean—A flexible method for assessing clustering efficiency
【2h】

Silhouette width using generalized mean—A flexible method for assessing clustering efficiency

机译:使用广义均值的轮廓宽度-一种评估聚类效率的灵活方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Cluster analysis plays vital role in pattern recognition in several fields of science. Silhouette width is a widely used index for assessing the fit of individual objects in the classification, as well as the quality of clusters and the entire classification. Silhouette combines two clustering criteria, compactness and separation, which imply that spherical cluster shapes are preferred over others—a property that can be seen as a disadvantage in the presence of complex, nonspherical clusters, which is common in real situations. We suggest a generalization of the silhouette width using the generalized mean. By changing the parameter of the generalized mean between −∞ and +∞, several specific summary statistics, including the minimum, maximum, the arithmetic, harmonic, and geometric means, can be reproduced. Implementing the generalized mean in the calculation of silhouette width allows for changing the sensitivity of the index to compactness versus connectedness. With higher sensitivity to connectedness, the preference of silhouette width toward spherical clusters should reduce. We test the performance of the generalized silhouette width on artificial data sets and on the Iris data set. We examine how classifications with different numbers of clusters prepared by different algorithms are evaluated, if is set to different values. When was negative, well‐separated clusters achieved high silhouette widths despite their elongated or circular shapes. Positive values of increased the importance of compactness; hence, the preference toward spherical clusters became even more detectable. With low , single linkage clustering was deemed the most efficient clustering method, while with higher parameter values the performance of group average, complete linkage, and beta flexible with beta = −0.25 seemed better. The generalized silhouette allows for adjusting the contribution of compactness and connectedness criteria, thus avoiding underestimation of clustering efficiency in the presence of clusters with high internal heterogeneity.
机译:聚类分析在多个科学领域的模式识别中起着至关重要的作用。轮廓宽度是用于评估分类中单个对象的适合度以及聚类和整个分类的质量的广泛使用的指标。 Silhouette结合了两个聚类标准,紧致度和分离度,这意味着球形聚簇形状比其他形状更受青睐-在存在复杂的非球形聚簇的情况下(在实际情况中很常见),该属性可被视为不利条件。我们建议使用广义均值来概括轮廓宽度。通过在-∞和+∞之间更改广义均值的参数,可以再现几个特定的​​摘要统计量,包括最小值,最大值,算术,谐波和几何平均值。在轮廓宽度的计算中采用广义均值可以更改索引对紧密度与连接度的敏感度。由于对连接的敏感性更高,因此应减小轮廓宽度对球形簇的偏爱。我们在人工数据集和虹膜数据集上测试了广义轮廓宽度的性能。如果设置为不同的值,我们将研究如何评估通过不同算法准备的具有不同数量簇的分类。当为负数时,尽管它们是细长的或圆形的,但分隔良好的群集仍具有较高的轮廓宽度。正值增加了紧凑性的重要性;因此,对球状星团的偏好变得更加可检测。使用低,单链接聚类被认为是最有效的聚类方法,而使用较高的参数值时,组平均值,完全链接和beta弹性为beta = -0.25的性能似乎更好。广义轮廓允许调整紧实度和连通性标准的贡献,从而避免在内部异质性较高的聚类存在时低估聚类效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号