...
首页> 外文期刊>Neurocomputing >The k-modes type clustering plus between-cluster information for categorical data
【24h】

The k-modes type clustering plus between-cluster information for categorical data

机译:k模式类型聚类以及分类数据的聚类间信息

获取原文
获取原文并翻译 | 示例

摘要

The k-modes algorithm and its modified versions are widely used to cluster categorical data. However, in the iterative process of these algorithms, the updating formulae, such as the partition matrix, cluster centers and attribute weights, are computed based on within-duster information only. The between-cluster information is not considered, which maybe result in the clustering results with weak separation among different clusters. Therefore, in this paper, we propose a new term which is used to reflect the separation. Furthermore, the new optimization objective functions are developed by adding the proposed term to the objective functions of several existing k-modes algorithms. Under the optimization framework, the corresponding updating formulae and convergence of the iterative process is strictly derived. The above improvements are used to enhance the effectiveness of these existing k-modes algorithms whilst keeping them simple. The experimental studies on real data sets from the UCl (University of California Irvine) Machine Learning Repository illustrate that these improved algorithms outperform their original counterparts in clustering categorical data sets and are also scalable to large data sets for their linear time complexity with respect to either the number of data objects, attributes or clusters.
机译:k模式算法及其修改版本被广泛用于聚类分类数据。但是,在这些算法的迭代过程中,更新公式(例如,分区矩阵,聚类中心和属性权重)仅基于除尘器内部信息来计算。不考虑集群之间的信息,这可能会导致集群结果,并且不同集群之间的间隔较弱。因此,在本文中,我们提出了一个新术语来反映分离。此外,通过将提议的术语添加到几种现有的k模式算法的目标函数中来开发新的优化目标函数。在优化框架下,严格推导了相应的更新公式和迭代过程的收敛性。上述改进用于增强这些现有k模式算法的有效性,同时保持其简单性。来自UCI(加利福尼亚大学欧文分校)机器学习存储库中的真实数据集的实验研究表明,这些改进的算法在聚类分类数据集方面优于原始方法,并且由于线性时间复杂度相对于两种方法而言,它们都可扩展至大型数据集数据对象,属性或群集的数量。

著录项

  • 来源
    《Neurocomputing 》 |2014年第10期| 111-121| 共11页
  • 作者

    Liang Bai; Jiye Liang;

  • 作者单位

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan, 030006 Shanxi, China;

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology,Shanxi University, Taiyuan, 030006 Shanxi, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Cluster analysis; Categorical data; The k-modes type algorithms; Optimization objective function; The between-cluster information;

    机译:聚类分析;分类数据;k模式类型算法;优化目标函数;集群间信息;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号