首页> 外文期刊>Knowledge-Based Systems >MGR: An information theory based hierarchical divisive clustering algorithm for categorical data
【24h】

MGR: An information theory based hierarchical divisive clustering algorithm for categorical data

机译:MGR:一种基于信息论的分类数据分层划分聚类算法

获取原文
获取原文并翻译 | 示例

摘要

Categorical data clustering has attracted much attention recently due to the fact that much of the data contained in today's databases is categorical in nature. While many algorithms for clustering categorical data have been proposed, some have low clustering accuracy while others have high computational complexity. This research proposes mean gain ratio (MGR), a new information theory based hierarchical divisive clustering algorithm for categorical data. MGR implements clustering from the attributes viewpoint which includes selecting a clustering attribute using mean gain ratio and selecting an equivalence class on the clustering attribute using entropy of clusters. It can be run with or without specifying the number of clusters while few existing clustering algorithms for categorical data can be run without specifying the number of clusters. Experimental results on nine University of California at Irvine (UCI) benchmark and ten synthetic data sets show that MGR performs better as compared to baseline algorithms in terms of its performance and efficiency of clustering.
机译:由于当今数据库中包含的许多数据实际上都是分类的,因此分类数据聚类最近引起了很多关注。虽然已经提出了许多用于对分类数据进行聚类的算法,但有些算法的聚类精度较低,而另一些算法的计算复杂度很高。这项研究提出了平均增益比(MGR),这是一种基于信息论的新分类数据聚类分层聚类算法。 MGR从属性的角度实现聚类,包括使用平均增益比选择聚类属性,并使用聚类熵在聚类属性上选择等价类。它可以在有或没有指定集群数的情况下运行,而很少有现有的用于分类数据的集群算法可以在没有指定集群数的情况下运行。在加州大学欧文分校(UCI)的9个基准测试和10个综合数据集上的实验结果表明,与聚类算法相比,MGR在性能和聚类效率方面表现更好。

著录项

  • 来源
    《Knowledge-Based Systems》 |2014年第9期|401-411|共11页
  • 作者单位

    Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Lebuhraya Tun Razak, Gambang, 26300 Kuantan, Malaysia,College of Computer Science & Engineering, Northwest Normal University, 730070 Lanzhou Gansu, PR China;

    Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Lebuhraya Tun Razak, Gambang, 26300 Kuantan, Malaysia,College of Computer Science & Engineering, Northwest Normal University, 730070 Lanzhou Gansu, PR China;

    Faculty of Computer Science and Information Technology, University of Malaya, 50603 Pantai Valley, Kuala Lumpur, Malaysia;

    Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Lebuhraya Tun Razak, Gambang, 26300 Kuantan, Malaysia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data mining; Clustering; Categorical data; Gain ratio; Information theory;

    机译:数据挖掘;集群;分类数据;增益比;信息论;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号