...
首页> 外文期刊>Knowledge-Based Systems >A dissimilarity measure for the k-Modes clustering algorithm
【24h】

A dissimilarity measure for the k-Modes clustering algorithm

机译:k模式聚类算法的差异度量

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes. In this paper, the limitations of the simple matching dissimilarity measure and Ng's dissimilarity measure are analyzed using some illustrative examples. Based on the idea of biological and genetic taxonomy and rough membership function, a new dissimilarity measure for the k-Modes algorithm is defined. A distinct characteristic of the new dissimilarity measure is to take account of the distribution of attribute values on the whole universe. A convergence study and time complexity of the k-Modes algorithm based on new dissimilarity measure indicates that it can be effectively used for large data sets. The results of comparative experiments on synthetic data sets and five real data sets from UCI show the effectiveness of the new dissimilarity measure, especially on data sets with biological and genetic taxonomy information.
机译:聚类是最重要的数据挖掘技术之一,它根据一些相似性准则对数据进行分区。最近,分类数据聚类的问题引起了数据挖掘研究界的广泛关注。作为k-Means算法的扩展,k-Modes算法通过用模式替换手段,已广泛应用于分类数据聚类。在本文中,使用一些说明性的例子分析了简单匹配不相似度量和Ng相似度量的局限性。基于生物学和遗传学分类学思想以及粗糙隶属度函数,定义了一种新的k-Modes算法相异性度量。新的差异度量的一个显着特征是要考虑属性值在整个宇宙中的分布。基于新的相异性度量的k-Modes算法的收敛性研究和时间复杂度表明,它可以有效地用于大型数据集。对来自UCI的合成数据集和五个真实数据集进行的比较实验结果表明,这种新的差异性测度是有效的,特别是在具有生物和遗传分类信息的数据集上。

著录项

  • 来源
    《Knowledge-Based Systems》 |2012年第2012期|p.120-127|共8页
  • 作者单位

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan 030006, Shanxi, China;

    Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    categorical data clustering; k-modes algorithm; rough membership function; dissimilarity measure; genetic taxonomy;

    机译:分类数据聚类;k模式算法;粗隶属函数差异度量;基因分类学;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号