首页> 外文期刊>Neurocomputing >The effect of low number of points in clustering validation via the negentropy increment
【24h】

The effect of low number of points in clustering validation via the negentropy increment

机译:通过负熵增量在聚类验证中点数少的影响

获取原文
获取原文并翻译 | 示例
       

摘要

We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.
机译:我们最近引入了负熵增量,这是用于脆性聚类的有效性指标,它使用负熵量化了聚类分区的平均正态性。该指数可以令人满意地处理具有不同方向,尺度和密度的聚类。索引的主要优点之一是计算简单,只需要计算协方差矩阵的对数行列式和每个聚类的先验概率。负熵增量提供的验证结果通常比其他经典聚类有效性指标的验证结果更好。然而,当分区区域中的数据点的数量少时,协方差矩阵的对数行列式的估计质量可能非常差。这会影响索引的正确量化,进而影响聚类的质量,因此需要其他要求,例如限制每个区域中最小点数。尽管这种约束可以提供良好的结果,但仍需要根据参数(例如数据空间的维度)进行调整。在本文中,我们研究了群集分区的负熵增量的估计如何受到点数少的区域的影响。我们发现,此估计中的误差取决于每个区域中点的数量,而不取决于其分布的规模或方向,并显示了如何纠正此误差以便获得负熵增量的无偏估计量。我们还量化估计中的不确定性量。正如我们所展示的,对于2D综合问题和多维实际基准问题,这些结果都可以用于验证具有明显改进的聚类分区。

著录项

  • 来源
    《Neurocomputing》 |2011年第16期|p.2657-2664|共8页
  • 作者单位

    Departamento de lngenieria Informatica, Escuela Politecnica Superior, Universidad Autonoma de Madrid, 28049 Madrid, Spain;

    Departamento de lngenieria Informatica, Escuela Politecnica Superior, Universidad Autonoma de Madrid, 28049 Madrid, Spain;

    Cognodata Consulting, Calle Caracas 23, 28010 Madrid, Spain;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    crisp clustering; cluster validation; negentropy increment;

    机译:脆簇;集群验证;负熵增量;
  • 入库时间 2022-08-18 02:08:15

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号