Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization

Ping Luo; Hui Xiong; Guoxing Zhan; Junjie Wu; Zhongzhi Shi

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization

【24h】

Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization

机译：聚类验证的信息理论距离度量：泛化和归一化

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studies the generalization and normalization issues of information-theoretic distance measures for clustering validation. Along this line, we first introduce a uniform representation of distance measures, defined as quasi-distance, which is induced based on a general form of conditional entropy. The quasi-distance possesses three properties: symmetry, the triangle law, and the minimum reachable. These properties ensure that the quasi-distance naturally lends itself as the external measure for clustering validation. In addition, we observe that the ranges of the distance measures are different when they apply for clustering validation on different data sets. Therefore, when comparing the performances of clustering algorithms on different data sets, distance normalization is required to equalize ranges of the distance measures. A critical challenge for distance normalization is to obtain the ranges of a distance measure when a data set is provided. To that end, we theoretically analyze the computation of the maximum value of a distance measure for a data set. Finally, we compare the performances of the partition clustering algorithm K-means on various real-world data sets. The experiments show that the normalized distance measures have better performance than the original distance measures when comparing clusterings of different data sets. Also, the normalized Shannon distance has the best performance among four distance measures under study.

机译：本文研究了用于聚类验证的信息理论距离度量的推广和归一化问题。沿着这条线，我们首先引入距离量度的统一表示形式，它被定义为准距离，它是根据条件熵的一般形式导出的。准距离具有三个属性：对称性，三角定律和最小可达性。这些特性确保准距离自然适合作为聚类验证的外部度量。此外，我们观察到距离度量的范围在应用于不同数据集的聚类验证时是不同的。因此，在比较聚类算法在不同数据集上的性能时，需要进行距离归一化以均衡距离度量的范围。距离归一化的关键挑战是在提供数据集时获取距离度量的范围。为此，我们从理论上分析了数据集距离度量最大值的计算。最后，我们比较了分区聚类算法K-means在各种实际数据集上的性能。实验表明，当比较不同数据集的聚类时，归一化距离度量比原始距离度量具有更好的性能。同样，在研究的四个距离度量中，归一化的香农距离具有最佳性能。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2009年第9期|p.1249-1262|共14页
作者
Ping Luo; Hui Xiong; Guoxing Zhan; Junjie Wu; Zhongzhi Shi;
展开▼
作者单位

Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
entropy; pattern clustering; K-means; clustering validation; conditional entropy; distance normalization; generalization issues; information-theoretic distance measures; minimum reachable property; partition clustering algorithm; symmetry property; triangle law property; K-means clustering.;

机译：熵;模式聚类;K-均值;聚类验证;条件熵;距离归一化;泛化问题;信息理论距离测度;最小可及性;分区聚类算法;对称性;三角律特性;K-均值聚类;

相似文献

外文文献
中文文献
专利

1. Generalization of clustering agreements and distances for overlapping clusters and network communities [J] . Rabbany Reihaneh, Zaiane Osmar R. Data mining and knowledge discovery . 2015,第5期

机译：群集协议和重叠群集和网络社区的距离的一般化
2. Normalized distance, similarity measure, inclusion measure and entropy of interval-valued fuzzy sets and their relationship [J] . Zeng WY, Guo P Information Sciences: An International Journal . 2008,第5期

机译：区间值模糊集的归一化距离，相似性度量，包含性度量和熵及其关系
3. Rough set approach for clustering categorical data using information-theoretic dependency measure [J] . Park In-Kyoo, Choi Gyoo-Seok Information Systems . 2015,第mara期

机译：基于信息理论相关性度量的分类数据聚类的粗糙集方法
4. A validation of ICA decomposition for PolSAR images by using measures of normalized compression distance [C] . Tanase Radu, Vaduva Corina, Datcu Mihai, IEEE International Conference on Image Processing . 2015

机译：使用归一化压缩距离的量度验证PolSAR图像的ICA分解
5. A Relational Framework for Clustering and Cluster Validity and the Generalization of the Silhouette Measure. [D] . Rawashdeh, Mohammad Y. 2013

机译：聚类和聚类有效性的关系框架以及轮廓测度的推广。
6. Generalization and discrimination tasks yield concordant measures of perceived distance between odours and their binary mixtures in larval Drosophila [O] . Yi-chun Chen, Bertram Gerber -1

机译：泛化和判别任务可得出果蝇果蝇气味与它们的二元混合物之间感知距离的一致度量
7. A validation of ICA decomposition for PolSAR images by using measures of normalized compression distance [O] . Radu Tanase, Corina Vaduva, Mihai Datcu, 2015

机译：使用归一化压缩距离的测量来验证POLSAR图像的DIA分解

Information-Theoretic Distance Measures for Clustering Validation: Generalization and Normalization

摘要

著录项

相似文献

相关主题

期刊订阅