首页> 外文会议>ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008 >SAIL: Summation-based Incremental Learning for Information-Theoretic Clustering
【24h】

SAIL: Summation-based Incremental Learning for Information-Theoretic Clustering

机译:SAIL:用于信息理论聚类的基于求和的增量学习

获取原文
获取外文期刊封面目录资料

摘要

Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which performs K-means clustering with the KL-divergence as the proximity function. While expert efforts on INFO-K-means have shown promising results, a remaining challenge is to deal with high-dimensional sparse data. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional sparse data. This leads to infinite KL-divergence values, which create a dilemma in assigning objects to the centroids during the iteration process of K-means. To meet this dilemma, in this paper, we propose a Summation-based Incremental Learning (SAIL) method for INFO-K-means clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of the KL-divergence by the computation of the Shannon entropy. This can avoid the zero-value dilemma caused by the use of the KL-divergence. Our experimental results on various real-world document data sets have shown that, with SAIL as a booster, the clustering performance of K-means can be significantly improved. Also, SAIL leads to quick convergence and a robust clustering performance on high-dimensional sparse data.
机译:信息理论聚类旨在将信息理论量度作为聚类标准。关于此主题的一种常见做法是所谓的INFO-K-means,它以KL-散度作为邻近函数执行K-means聚类。尽管专家对INFO-K-means的努力已显示出令人鼓舞的结果,但仍然存在的挑战是处理高维稀疏数据。实际上,质心可能包含许多零维特征,用于高维稀疏数据。这导致无限的KL散度值,这在K均值的迭代过程中给质心分配对象时造成了难题。为了解决这个难题,在本文中,我们提出了一种基于求和的增量学习(SAIL)的INFO-K-means聚类方法。具体地,通过使用等效目标函数,SAIL用香农熵的计算代替了KL-散度的计算。这样可以避免由于使用KL散度而导致的零值难题。我们在各种现实世界文档数据集上的实验结果表明,使用SAIL作为增强器,可以显着提高K均值的聚类性能。此外,SAIL还可以在高维稀疏数据上实现快速收敛和强大的聚类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号