首页> 外文会议>SIAM International Conference on Data Mining >Multi-field Correlated Topic Modeling
【24h】

Multi-field Correlated Topic Modeling

机译:多场相关主题建模

获取原文

摘要

Popular methods for probabilistic topic modeling like the Latent Dirichlet Allocation (LDA) and Correlated Topic Models (CTM) share an important property, i.e., using a common set of topics to model all the data. This property can be too restrictive for modeling complex data entries where multiple fields of heterogeneous data jointly provide rich information about each object or event. We propose a new extension of the CTM method to enable modeling with multi-field topics in a global graphical structure, and a mean-field variational algorithm to allow joint learning of multinomial topic models from discrete data and Gaussian-style topic models for real-valued data. We conducted experiments with both simulated and real data, and observed that the multi-field CTM outperforms a conventional CTM in both likelihood maximization and perplexity reduction. A deeper analysis on the simulated data reveals that the superior performance is the result of successful discovery of the mapping among field-specific topics and observed data.
机译:对于像隐含狄利克雷分布(LDA)和相关主题模型(CTM)的概率主题建模常用的方法共享的一个重要特性,即,使用一套共同的主题,以模型中的所有数据。这个属性可以限制太大,造型复杂的数据项,其中异构数据的多个领域共同提供关于每个对象或事件的丰富信息。我们建议CTM方法的一个新的扩展,使在全球图形结构,多领域的主题造型,以及平均场变算法,允许从离散数据和高斯风格的主题模型多项主题模型的共同学习的现实有价值的数据。我们进行了与两个模拟和实际数据的实验,并且观察到的是,多场CTM优于在两个似然最大化和减少困惑常规CTM。在模拟数据更深入的分析表明,性能优越,是现场的具体主题和观测数据之间的映射成功发现的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号