首页> 外文期刊>Systems and Computers in Japan >Parametric Mixture Model for Multitopic Text
【24h】

Parametric Mixture Model for Multitopic Text

机译:多主题文本的参数混合模型

获取原文
获取原文并翻译 | 示例
           

摘要

In general, text has multiple topics. Thus, automatic topic detection from text is harder than the traditional pattern classification tasks because multiple categories must be considered in text categorization. Since the conventional methods do not consider a generative model of multicategory text, they have an important limitation when applied to the multicategory detection problem. In this paper, we propose new probabilistic generative models, parametric mixture models (PMM1 and PMM2), and then present a method for simultaneously detecting multiple topics from text using PMMs. In PMMs, all multitopic classes can be completely represented by basis vectors each of which corresponds to a single-topic class. Moreover, the global optimality of estimated parameter values is theoretically guaranteed in PMM1. Furthermore, parameter estimation and topic detection algorithms are quite efficient. We also empirically show the usefulness of our method through multitopic categorization of World Wide Web pages, focusing on those from the "yahoo.com" domain.
机译:通常,文本具有多个主题。因此,从文本中自动检测主题比传统的模式分类任务要困难,因为在文本分类中必须考虑多个类别。由于常规方法没有考虑多类别文本的生成模型,因此在应用于多类别检测问题时,它们具有重要的局限性。在本文中,我们提出了新的概率生成模型,参数混合模型(PMM1和PMM2),然后提出了一种使用PMM同时检测文本中多个主题的方法。在PMM中,所有多主题类别都可以由基本向量完全表示,每个向量对应于一个单主题类别。此外,PMM1在理论上保证了估计参数值的全局最优性。此外,参数估计和主题检测算法非常有效。我们还通过对万维网页面进行多主题分类(以“ yahoo.com”域中的页面为重点),从经验上证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号