A dissimilarity measure for the k-Modes clustering algorithm

Fuyuan Cao; Jiye Liang; Deyu Li; Liang Bai; Chuangyin Dang

首页> 外文期刊>Knowledge-Based Systems >A dissimilarity measure for the k-Modes clustering algorithm

【24h】

A dissimilarity measure for the k-Modes clustering algorithm

机译：k模式聚类算法的差异度量

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes. In this paper, the limitations of the simple matching dissimilarity measure and Ng's dissimilarity measure are analyzed using some illustrative examples. Based on the idea of biological and genetic taxonomy and rough membership function, a new dissimilarity measure for the k-Modes algorithm is defined. A distinct characteristic of the new dissimilarity measure is to take account of the distribution of attribute values on the whole universe. A convergence study and time complexity of the k-Modes algorithm based on new dissimilarity measure indicates that it can be effectively used for large data sets. The results of comparative experiments on synthetic data sets and five real data sets from UCI show the effectiveness of the new dissimilarity measure, especially on data sets with biological and genetic taxonomy information.

机译：聚类是最重要的数据挖掘技术之一，它根据一些相似性准则对数据进行分区。最近，分类数据聚类的问题引起了数据挖掘研究界的广泛关注。作为k-Means算法的扩展，k-Modes算法通过用模式替换手段，已广泛应用于分类数据聚类。在本文中，使用一些说明性的例子分析了简单匹配不相似度量和Ng相似度量的局限性。基于生物学和遗传学分类学思想以及粗糙隶属度函数，定义了一种新的k-Modes算法相异性度量。新的差异度量的一个显着特征是要考虑属性值在整个宇宙中的分布。基于新的相异性度量的k-Modes算法的收敛性研究和时间复杂度表明，它可以有效地用于大型数据集。对来自UCI的合成数据集和五个真实数据集进行的比较实验结果表明，这种新的差异性测度是有效的，特别是在具有生物和遗传分类信息的数据集上。

著录项

来源
《Knowledge-Based Systems》 |2012年第2012期|p.120-127|共8页
作者
Fuyuan Cao; Jiye Liang; Deyu Li; Liang Bai; Chuangyin Dang;
展开▼
作者单位

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
categorical data clustering; k-modes algorithm; rough membership function; dissimilarity measure; genetic taxonomy;

机译：分类数据聚类;k模式算法;粗隶属函数差异度量;基因分类学;

相似文献

外文文献
中文文献
专利

1. On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm [J] . Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2007,第期

机译：k模态聚类算法中相异性度量的影响
2. A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm [J] . Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第5期

机译：K-Modes算法中混合标称和序数数据数据的异化度量
3. An Extensive Study of Similarity and Dissimilarity Measures Used for Text Document Clustering using K-means Algorithm [J] . Maedeh Afzali, Suresh Kumar International Journal of Information Technology and Computer Science . 2018,第9期

机译：基于K-means算法的文本文档聚类中相似度和相异度度量的广泛研究
4. An Improved K-modes Clustering Algorithm Based on Intra-cluster and Inter-cluster Dissimilarity Measure [C] . Hongfang Zhou, Yihui Zhang, Yibin Liu International Conference on Computer Engineering, Information Science Application Technology . 2017

机译：基于簇内和群集间不同测量的改进的K模聚类算法
5. Learning-Based Dissimilarity Measure for Rigid and Non-Rigid Medical Image Registration [D] . So, Wai King. 2017

机译：刚性和非刚性医学图像配准的基于学习的差异度量
6. A Global-Relationship Dissimilarity Measure for the k-Modes Clustering Algorithm [O] . Hongfang Zhou, Yihui Zhang, Yibin Liu 2017

机译：k模式聚类算法的全局关系差异度量
7. An Improved K-modes Clustering Algorithm Based on Intra-cluster and Inter-cluster Dissimilarity Measure [O] . Hongfang Zhou, Yihui Zhang, Yibin Liu 2017

机译：基于簇内和群集间不同测量的改进的K模聚类算法

A dissimilarity measure for the k-Modes clustering algorithm

摘要

著录项

相似文献

相关主题

期刊订阅