首页> 外文会议>AI 2003: Advances in Artificial Intelligence >Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML
【24h】

Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML

机译:使用MML的相关多元高斯混合模型的无监督学习

获取原文
获取原文并翻译 | 示例

摘要

Mixture modelling or unsupervised classification is the problem of identifying and modelling components (or clusters, or classes) in a body of data. We consider here the application of the Minimum Message Length (MML) principle to a mixture modelling problem of multi-variate Gaussian distributions. Earlier work in MML mixture modelling includes the multinomial, Gaussian, Poisson, von Mises circular, and Student t distributions and in these applications all variables in a component are assumed to be uncorrelated with each other. In this paper, we propose a more general type of MML mixture modelling which allows the variables within a component to be correlated. Two MML approximations are used. These are the Wallace and Freeman (1987) approximation and Dowe's MMLD approximation (2002). The former is used for calculating the relative abundances (mixing proportions) of each component and the latter is used for estimating the distribution parameters involved in the components of the mixture model. The proposed method is applied to the analysis of two real-world datasets - the well-known (Fisher) Iris and diabetes datasets. The modelling results are then compared with those obtained using two other modelling criteria, AIC and BIC (which is identical to Rissanen's 1978 MDL), in terms of their probability bit-costings, and show that the proposed MML method performs better than both these criteria. Furthermore, the MML method also infers more closely the three underlying Iris species than both AIC and BIC.
机译:混合建模或无监督分类是识别和建模数据主体中的组件(或集群或类)的问题。我们在这里考虑最小消息长度(MML)原理在多元高斯分布的混合建模问题中的应用。 MML混合建模的早期工作包括多项式,高斯,泊松,冯·米塞斯圆和学生t分布,在这些应用中,假定组件中的所有变量彼此不相关。在本文中,我们提出了一种更通用的MML混合建模类型,该模型允许将组件内的变量关联起来。使用了两个MML近似值。这些是Wallace和Freeman(1987)的近似值和Dowe的MMLD近似值(2002)。前者用于计算每个组分的相对丰度(混合比例),而后者用于估计混合物模型的组分所涉及的分布参数。所提出的方法可用于分析两个真实世界的数据集-著名的(Fisher)虹膜和糖尿病数据集。然后将建模结果与使用其他两个建模标准AIC和BIC(与Rissanen的1978 MDL相同)获得的结果进行概率比特成本比较,结果表明,提出的MML方法的效果优于这两个标准。此外,与AIC和BIC相比,MML方法还可以更紧密地推断出三种潜在的虹膜物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号