...
首页> 外文期刊>Advances in Data Analysis and Classification >A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model
【24h】

A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model

机译:一种在高斯混合模型中对混合二进制和连续变量进行聚类的潜在变量方法

获取原文
获取原文并翻译 | 示例

摘要

For clustering objects, we often collect not only continuous variables, but binary attributes as well. This paper proposes a model-based clustering approach with mixed binary and continuous variables where each binary attribute is generated by a latent continuous variable that is dichotomized with a suitable threshold value, and where the scores of the latent variables are estimated from the binary data. In economics, such variables are called utility functions and the assumption is that the binary attributes (the presence or the absence of a public service or utility) are determined by low and high values of these functions. In genetics, the latent response is interpreted as the ‘liability’ to develop a qualitative trait or phenotype. The estimated scores of the latent variables, together with the observed continuous ones, allow to use a multivariate Gaussian mixture model for clustering, instead of using a mixture of discrete and continuous distributions. After describing the method, this paper presents the results of both simulated and real-case data and compares the performances of the multivariate Gaussian mixture model and of a mixture of joint multivariate and multinomial distributions. Results show that the former model outperforms the mixture model for variables with different scales, both in terms of classification error rate and reproduction of the clusters means.
机译:对于群集对象,我们通常不仅收集连续变量,而且还收集二进制属性。本文提出了一种基于模型的聚类方法,该方法具有混合的二进制和连续变量,其中每个二进制属性由潜在连续变量生成,该潜在连续变量被分为合适的阈值,并从二进制数据中估算了潜在变量的得分。在经济学中,此类变量称为效用函数,并且假定二进制属性(是否存在公共服务或效用)由这些函数的低值和高值确定。在遗传学中,潜在反应被解释为发展定性特征或表型的“责任”。潜在变量的估计分数以及观察到的连续变量的分数,允许使用多元高斯混合模型进行聚类,而不是使用离散分布和连续分布的混合。在描述了该方法之后,本文介绍了模拟数据和实际数据的结果,并比较了多元高斯混合模型以及联合的多元和多项式分布混合的性能。结果表明,在分类错误率和聚类均值的再现方面,对于不同规模的变量,前者的模型优于混合模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号