首页> 外文会议>Canadian conference on artificial intelligence >MML-Based Approach for Determining the Number of Topics in EDCM Mixture Models
【24h】

MML-Based Approach for Determining the Number of Topics in EDCM Mixture Models

机译:基于MML的EDCM混合模型中确定主题数的方法

获取原文

摘要

This paper proposes an unsupervised algorithm for learning a finite mixture model of the exponential family approximation to the Dirichlet Compound Multinomial (EDCM). An important part of the mixture modeling problem is determining the number of components that best describes the data. In this work, we extend the Minimum Message Length (MML) principle to determine the number of topics (clusters) in case of text modeling using a mixture of EDCMs. Parameters estimation is based on the previously proposed deterministic annealing expectation-maximization approach. The proposed method is validated using several document collections. A comparison with results obtained for other selection criteria is provided.
机译:本文提出了一种无监督算法,用于学习Dirichlet复合多项式(EDCM)的指数族近似的有限混合模型。混合物建模问题的重要部分是确定最能描述数据的组分数量。在这项工作中,我们扩展了最小消息长度(MML)原则,以确定使用EDCM混合进行文本建模的情况下主题(群集)的数量。参数估计基于先前提出的确定性退火期望最大化方法。所提出的方法已使用多个文档集进行了验证。提供与其他选择标准获得的结果的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号