首页> 外文会议>International Conference on Artificial Neural Networks >Minimum-Entropy Data Clustering Using Reversible Jump Markov Chain Monte Carlo
【24h】

Minimum-Entropy Data Clustering Using Reversible Jump Markov Chain Monte Carlo

机译:最小熵数据聚类使用可逆跳转马尔可夫链Monte Carlo

获取原文

摘要

Many problems in data analysis, especially in signal and image processing, require the unsupervised partitioning of data into a set of 'self-similar' classes or clusters. An ideal partitioning unambiguously assigns each datum to a single class and one thinks of the data as being generated by a number of data generators, one for each class. Many algorithms have been proposed for such analysis and for the estimation of the optimal number of partitions. The majority of popular and computationally feasible techniques rely on assuming that classes are hyper-ellipsoidal in shape. In the case of Gaussian mixture modelling [15,6] this is explicit; in the case of dendogram linkage methods (which typically rely on the L_2 norm) it is implicit [9]. For some data sets this leads to an over-partitioning. Alternative methods, based on valley seeking [6] or maxima-tracking in scale-space [16,18,13] for example, have the advantage that they are free from such assumptions. They can be, however, sensitive to noise and computationally intensive in high-dimensional spaces. in this paper we re-consider the issue of data partitioning from an information-theoretic viewpoint and show that minimisation of partition entropy may be used to evaluate the most probable set of data generators. Rather than formulate the problem as one of a traditional model-order estimation to infer the most probable number of classes we employ a reversible jump mechanism in a Markov-chain Monte Carlo (MCMC) sampler which explores the space of different model sizes.
机译:数据分析中的许多问题,特别是在信号和图像处理中,要求将数据的无监督分区分别分为一组“自相似的”类或集群。一个理想的分区明确地将每个基准分配给单个类,并将数据视为由多个数据生成器生成的数据,每个类别为每个类。已经提出了许多算法用于这种分析和估计最佳分区数。大多数流行和计算可行的技术依赖于假设类别是超椭圆形的形状。在高斯混合建模[15,6]的情况下,这是明确的;在DendogroGlage方法的情况下(通常依赖于L_2 NORM),它是隐式的[9]。对于某些数据,将这导致过度分区。例如,基于谷寻求[6]或在规模空间中的最大跟踪[16,18,13]的替代方法例如,具有它们不含这种假设的优点。然而,它们可以对噪声和高维空间的计算密集敏感。在本文中,我们重新考虑来自信息理论观点的数据分区问题,并显示分区熵的最小化可用于评估最可能的数据生成器集。不是将问题作为传统的模型订单估计之一,而是推断最可能的类别数量,我们在马尔可夫链蒙特卡罗(MCMC)采样器中采用可逆跳转机制,探讨了不同模型尺寸的空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号