首页> 外文期刊>Informatica >Holo-Entropy Based Categorical Data Hierarchical Clustering
【24h】

Holo-Entropy Based Categorical Data Hierarchical Clustering

机译:基于全息熵的分类数据层次聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationally demanding techniques such as PCA for numerical data. Hierarchical clustering algorithms are often based on similarity measures computed on a common feature space, which is not effective when clustering high dimensional data. Subspace clustering algorithms discover feature subspaces for clusters, but are mostly partition-based; i.e. they do not produce a hierarchical structure of clusters. In this paper, we propose a hierarchical algorithm for clustering high-dimensional categorical data, based on a recently proposed information-theoretical concept named holo-entropy. The algorithm proposes new ways of exploring entropy, holo-entropy and attribute weighting in order to determine the feature subspace of a cluster and to merge clusters even though their feature subspaces differ. The algorithm is tested on UCI datasets, and compared with several state-of-the-art algorithms. Experimental results show that the proposed algorithm yields higher efficiency and accuracy than the competing algorithms and allows higher reproducibility.
机译:在数据挖掘中,对高维数据进行聚类是一项艰巨的任务,而对高维分类数据进行聚类则更具挑战性,因为要衡量分类对象之间的相似性更加困难。大多数算法在计算数据对象之间的相似性时都假定功能独立,或者对数字数据使用诸如PCA之类的计算要求很高的技术。分层聚类算法通常基于在公共特征空间上计算的相似性度量,这在聚类高维数据时无效。子空间聚类算法发现了集群的特征子空间,但主要基于分区。即它们不会产生集群的层次结构。在本文中,我们基于最近提出的信息理论概念-全熵,提出了一种用于对高维分类数据进行聚类的分层算法。该算法提出了探索熵,全熵和属性加权的新方法,以便确定聚类的特征子空间并合并聚类,即使它们的特征子空间不同。该算法在UCI数据集上进行了测试,并与几种最新算法进行了比较。实验结果表明,与竞争算法相比,该算法具有更高的效率和准确性,并具有更高的重现性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号