首页> 外文会议>ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10 >A Hierarchical Information Theoretic Technique for the Discovery of Non Linear Alternative Clusterings
【24h】

A Hierarchical Information Theoretic Technique for the Discovery of Non Linear Alternative Clusterings

机译:非线性替代聚类发现的分层信息理论技术

获取原文
获取外文期刊封面目录资料

摘要

Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique's performance is generally better for non-linear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.
机译:发现替代聚类是探索复杂数据集的重要方法。它为用户提供了从不同角度查看聚类行为并由此探索新假设的能力。但是,当前用于替代聚类的算法主要集中在线性场景上,对于包含具有非线性形状的聚类的数据集可能无法达到预期的效果。本文的目标是解决非线性挑战。特别是,我们提出了一种新颖的算法来揭示与现有参考聚类明显不同的替代聚类。我们的技术基于信息论,旨在通过最大化聚类标签和数据观测值之间的互信息来确保替代聚类质量,同时通过最小化两个聚类之间的信息共享来确保替代聚类的独特性。我们进行实验以针对文献中的大量替代聚类算法评估我们的方法。我们证明了我们的技术在非线性情况下的性能通常更好,而且即使在更简单的线性情况下也具有很高的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号