Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number

Cheung Y.-M.; Jia H.

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number

【24h】

Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number

机译：基于统一相似性度量的分类和数字属性数据聚类，而无需知道聚类编号

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering framework based on the concept of object-cluster similarity and gives a unified similarity metric which can be simply applied to the data with categorical, numerical, and mixed attributes. Accordingly, an iterative clustering algorithm is developed, whose outstanding performance is experimentally demonstrated on different benchmark data sets. Moreover, to circumvent the difficult selection problem of cluster number, we further develop a penalized competitive learning algorithm within the proposed clustering framework. The embedded competition and penalization mechanisms enable this improved algorithm to determine the number of clusters automatically by gradually eliminating the redundant clusters. The experimental results show the efficacy of the proposed approach.

机译：大多数现有的聚类方法仅适用于纯数字或分类数据，而不适用于两者。通常，对由数值和类别属性组成的混合数据执行聚类是一项艰巨的任务，因为在类别和数值数据的相似性度量之间存在一个尴尬的差距。因此，本文提出了一种基于对象-集群相似性概念的通用聚类框架，并给出了一个统一的相似性度量，可以将其简单地应用于具有分类，数值和混合属性的数据。因此，开发了一种迭代聚类算法，在不同的基准数据集上实验证明了其出色的性能。此外，为了避免集群数的选择困难问题，我们在提出的集群框架内进一步开发了一种惩罚性竞争学习算法。嵌入式竞争和惩罚机制使这种改进的算法能够通过逐渐消除冗余集群来自动确定集群数量。实验结果表明了该方法的有效性。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2013年第8期|共11页
作者
Cheung Y.-M.; Jia H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Categorical attribute; Clustering; Number of clusters; Numerical attribute; Similarity metric;

机译：分类属性;聚类;聚类数;数值属性;相似度;

相似文献

外文文献
中文文献
专利

1. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number [J] . Cheung Y.-M., Jia H. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第8期

机译：基于统一相似性度量的分类和数字属性数据聚类，而无需知道聚类编号
2. Shrinkage-based similarity metric for cluster analysis of microarray data [J] . Cherepinsky V., Feng JW., Rejali M., Proceedings of the National Academy of Sciences of the United States of America . 2003,第17期

机译：基于收缩的相似性度量用于微阵列数据的聚类分析
3. Testing the homogeneity of proportions for clustered binary data without knowing the correlation structure [J] . Tsou Tsung-Shan, Liu Hsiao-Yun Journal of applied statistics . 2015,第7a8期

机译：在不知道相关结构的情况下测试聚类二进制数据的比例均匀性
4. A Unified Metric for Categorical and Numerical Attributes in Data Clustering [C] . Yiu-ming Cheung, Hong Jia Pacific-Asia conference on knowledge discovery and data mining . 2013

机译：数据聚类中分类和数值属性的统一度量
5. Effects of similarity metrics on document clustering. [D] . Veni, Rushikesh. 2009

机译：相似性指标对文档聚类的影响。
6. Shrinkage-based similarity metric for cluster analysis of microarray data [O] . Vera Cherepinsky, Jiawu Feng, Marc Rejali, 2003

机译：基于收缩的相似性度量用于微阵列聚类分析数据
7. Similarity Metric Based on Resistance Distance and Its Applications to Data Clustering* [O] . Gengqi Guo, Wenjun Xiao, Bin Lu 2016

机译：基于阻力距离的相似度量及其在数据聚类中的应用*

Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number

摘要

著录项

相似文献

相关主题

期刊订阅