The effect of low number of points in clustering validation via the negentropy increment

Luis F. Lago-Femande; Manuel Sanchez-Montane; Fernando Corbacho

首页> 外文期刊>Neurocomputing >The effect of low number of points in clustering validation via the negentropy increment

【24h】

The effect of low number of points in clustering validation via the negentropy increment

机译：通过负熵增量在聚类验证中点数少的影响

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We recently introduced the negentropy increment, a validity index for crisp clustering that quantifies the average normality of the clustering partitions using the negentropy. This index can satisfactorily deal with clusters with heterogeneous orientations, scales and densities. One of the main advantages of the index is the simplicity of its calculation, which only requires the computation of the log-determinants of the covariance matrices and the prior probabilities of each cluster. The negentropy increment provides validation results which are in general better than those from other classic cluster validity indices. However, when the number of data points in a partition region is small, the quality in the estimation of the log-determinant of the covariance matrix can be very poor. This affects the proper quantification of the index and therefore the quality of the clustering, so additional requirements such as limitations on the minimum number of points in each region are needed. Although this kind of constraints can provide good results, they need to be adjusted depending on parameters such as the dimension of the data space. In this article we investigate how the estimation of the negentropy increment of a clustering partition is affected by the presence of regions with small number of points. We find that the error in this estimation depends on the number of points in each region, but not on the scale or orientation of their distribution, and show how to correct this error in order to obtain an unbiased estimator of the negentropy increment. We also quantify the amount of uncertainty in the estimation. As we show, both for 2D synthetic problems and multidimensional real benchmark problems, these results can be used to validate clustering partitions with a substantial improvement.

机译：我们最近引入了负熵增量，这是用于脆性聚类的有效性指标，它使用负熵量化了聚类分区的平均正态性。该指数可以令人满意地处理具有不同方向，尺度和密度的聚类。索引的主要优点之一是计算简单，只需要计算协方差矩阵的对数行列式和每个聚类的先验概率。负熵增量提供的验证结果通常比其他经典聚类有效性指标的验证结果更好。然而，当分区区域中的数据点的数量少时，协方差矩阵的对数行列式的估计质量可能非常差。这会影响索引的正确量化，进而影响聚类的质量，因此需要其他要求，例如限制每个区域中最小点数。尽管这种约束可以提供良好的结果，但仍需要根据参数（例如数据空间的维度）进行调整。在本文中，我们研究了群集分区的负熵增量的估计如何受到点数少的区域的影响。我们发现，此估计中的误差取决于每个区域中点的数量，而不取决于其分布的规模或方向，并显示了如何纠正此误差以便获得负熵增量的无偏估计量。我们还量化估计中的不确定性量。正如我们所展示的，对于2D综合问题和多维实际基准问题，这些结果都可以用于验证具有明显改进的聚类分区。

著录项

来源
《Neurocomputing》 |2011年第16期|p.2657-2664|共8页
作者
Luis F. Lago-Femande; Manuel Sanchez-Montane; Fernando Corbacho;
展开▼
作者单位

Departamento de lngenieria Informatica, Escuela Politecnica Superior, Universidad Autonoma de Madrid, 28049 Madrid, Spain;

Departamento de lngenieria Informatica, Escuela Politecnica Superior, Universidad Autonoma de Madrid, 28049 Madrid, Spain;

Cognodata Consulting, Calle Caracas 23, 28010 Madrid, Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
crisp clustering; cluster validation; negentropy increment;

机译：脆簇;集群验证;负熵增量;
入库时间 2022-08-18 02:08:15

相似文献

外文文献
中文文献
专利

1. A New Clustering Separation Measure Based on Negentropy [J] . Allan Martins, Adri?o Duarte, Jorge Dantas, Journal of control, automation and electrical systems . 2015,第1期

机译：基于负熵的聚类分离新方法
2. Daily Increment Validation and Effects of Streamflow Variability and Water Temperature on Growth of Age-0 Flathead Chub [J] . Haworth Matthew R., Bestgen Kevin R. North American Journal of Fisheries Management . 2016,第4期

机译：每日增量验证以及水流变化和水温对Age-0 Flathead Chub生长的影响
3. Application and validation of incrementally complex models for wind turbine aerodynamics, isolated wind turbine in uniform inflow conditions [J] . Gundling Chris, Sitaraman Jay, Roget Beatrice, Wind Energy . 2015,第11期

机译：风力涡轮机空气动力学渐进复杂模型的应用和验证，均流条件下的隔离式风力涡轮机
4. Using the Negentropy Increment to Determine the Number of Clusters [C] . Luis F. Lago-Fernandez, Fernando Corbacho Bio-inspired systems: Computational and ambient intelligence . 2009

机译：使用负熵增量确定簇数
5. Incremental validation of XML documents and mappings. [D] . Barbosa, Denilson de Moura. 2005

机译：XML文档和映射的增量验证。
6. Incremental impact on malaria incidence following indoor residual spraying in a highly endemic area with high standard ITN access in Mozambique: results from a cluster‐randomized study [O] . Carlos Chaccour, Rose Zulliger, Joe Wagman, 2021

机译：在莫桑比克高标准ITN接入的高度条目区域中室内残留喷涂后对疟疾发病率的增量影响：簇随机研究的结果
7. The effect of low number of points in clustering validation via the negentropy increment [O] . Lago-Fernández, Luis F., Sánchez-Montañés, Manuel, Corbacho Abelaira, Fernando 2011

机译：通过负熵增量在聚类验证中的低点数的影响
8. Incremental Model-Based Clustering for Large Datasets With Small Clusters [R] . Fraley, C. , Raftery, A. , Wehrensy, R. 2003

机译：基于增量模型的聚类适用于具有小集群的大型数据集

The effect of low number of points in clustering validation via the negentropy increment

摘要

著录项

相似文献

相关主题

期刊订阅