首页> 外文期刊>Computational Intelligence >Mixture-based clustering for count data using approximated Fisher Scoring and Minorization-Maximization approaches
【24h】

Mixture-based clustering for count data using approximated Fisher Scoring and Minorization-Maximization approaches

机译:使用近似Fisher评分和较小化最大化方法的计数数据的混合基于聚类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The multinomial distribution has been widely used to model count data. To increase clustering efficiency, we use an approximation to the Fisher scoring algorithm, which is more robust regarding the choice of initial parameter values. Then, we use a novel approach to estimate the optimal number of components, based on minimum message length criterion. Moreover, we consider a generalization of the multinomial model obtained by introducing the Dirichlet as prior, yielding the Dirichlet Compound Multinomial (DCM). Even though DCM can address the burstiness phenomenon of count data, the presence of Gamma function in its density function usually leads to undesired complications. In this article, we use two alternative representations of DCM distribution to perform clustering based on finite mixture models, where the mixture parameters are estimated using the minorization-maximization framework. To evaluate and compare the performance of our proposed models, we have considered three challenging real-world applications that involve high-dimensional count vectors, namely, sentiment analysis, facial expression recognition, and human action recognition. The results show that the proposed algorithms increase the clustering efficiency of their respective models remarkably, and the best results are achieved by the second parametrization of DCM, which can accommodate over-dispersed count data.
机译:多项分布已广泛用于模拟计数数据。为了提高聚类效率,我们使用近似到Fisher评分算法,这对初始参数值的选择更加强大。然后,我们使用一种新的方法来估计基于最小消息长度标准的最佳组件数量。此外,我们考虑通过在先前引入Dirichlet而获得的多聚体模型的概括,得到Dirichlet化合物多项式(DCM)。尽管DCM可以解决计数数据的突破现象,但其密度函数中的存在通常导致不希望的并发症。在本文中,我们使用DCM分布的两个替代表示基于有限的混合模型来执行聚类,其中使用较小化最大化框架估计混合参数。为了评估和比较我们拟议的模型的表现,我们考虑了三个挑战性的现实世界应用,涉及高维计数向量,即情绪分析,面部表情识别和人类行动识别。结果表明,所提出的算法显着提高其各自模型的聚类效率,并且通过DCM的第二参数化实现了最佳结果,其可以容纳过分散的计数数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号