首页> 外文会议>Intelligent Data Engineering and Automated Learning >Clustering by similarity in an auxiliary space
【24h】

Clustering by similarity in an auxiliary space

机译:通过辅助空间中的相似性聚类

获取原文

摘要

We present a clustering method for continuous data. It defines local clusters into the (primary) data space but derives its similarity measure from the posterior distributions of additional discrete data that occur as pairs with the primary data. As a case study, enterprises are clustered by deriving the similarity measure from bankruptcy sensitivity. In another case study, a content-based clustering for text documents is found by measuring differences between their metadata (keyword distributions). We show that minimizing our Kullback-Leibler divergence-based distortion measure within the categories is equivalent to maximizing the mutual information between the categories and the distributions in the auxiliary space. A simple on-line algorithm for minimizing the distortion is introduced for Gaussian basis functions and their analogs on a hypersphere.
机译:我们提出了一种连续数据的聚类方法。它将本地群集定义为(主要)数据空间,而是从与主要数据的对发生的附加离散数据的后部分布导出其相似度。作为一个案例研究,企业通过从破产敏感性中获得相似度措施而聚集。在另一个案例研究中,通过测量它们的元数据(关键字分布)之间的差异来找到基于内容的文本文档的聚类。我们表明,在类别中最小化基于Kullback-Leibler分歧的失真测量相当于最大化类别和辅助空间中的分布之间的相互信息。引入了最小化失真的简单在线算法,用于高斯基础函数及其在极度上的类似物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号