首页> 外文期刊>ACM transactions on knowledge discovery from data >Probabilistic Topic Modeling for Comparative Analysis of Document Collections
【24h】

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

机译:用于文档收集比较分析的概率主题建模

获取原文
获取原文并翻译 | 示例

摘要

Probabilistic topic models, which can discover hidden patterns in documents, have been extensively studied. However, rather than learning from a single document collection, numerous real-world applications demand a comprehensive understanding of the relationships among various document sets. To address such needs, this article proposes a new model that can identify the common and discriminative aspects of multiple datasets. Specifically, our proposed method is a Bayesian approach that represents each document as a combination of common topics (shared across all document sets) and distinctive topics (distributions over words that are exclusive to a particular dataset). Through extensive experiments, we demonstrate the effectiveness of our method compared with state-of-the-art models. The proposedmodel can be useful for "comparative thinking" analysis in real-world document collections.
机译:可以发现文档中隐藏模式的概率主题模型已得到广泛研究。但是,许多现实世界的应用程序不是从单个文档集中学习,而是需要全面理解各种文档集之间的关系。为了满足这些需求,本文提出了一个新模型,该模型可以识别多个数据集的共同点和区别点。具体而言,我们提出的方法是一种贝叶斯方法,该方法将每个文档表示为常见主题(在所有文档集中共享)和特殊主题(特定数据集专有的单词分布)的组合。通过广泛的实验,我们证明了与最新模型相比该方法的有效性。提出的模型对于现实世界文档集中的“比较思维”分析可能很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号