首页> 外文会议>CIKM 10;ACM conference on information and knowledge management >PTM: Probabilistic Topic Mapping Model for Mining Parallel Document Collections
【24h】

PTM: Probabilistic Topic Mapping Model for Mining Parallel Document Collections

机译:PTM:挖掘并行文档集合的概率主题映射模型

获取原文

摘要

Many applications generate a large volume of parallel document collections. A parallel document collection consists of two sets of documents where the documents in each set correspond to each other and form semantic pairs (e.g., pairs of problem and solution descriptions in a help-desk setting). Although much work has been done on text mining, little previous work has attempted to mine such a novel kind of text data. In this paper, we propose a new probabilistic topic model, called Probabilistic Topic Mapping (PTM) model, to mine parallel document collections to simultaneously discover latent topics in both sets of documents as well as the mapping of topics in one set to those in the other. We evaluate the PTM model on a parallel document collection in IT service domain. We show that PTM can effectively discover meaningful topics, as well as their mappings, and it's also useful for improving text matching and retrieval when there's a vocabulary gap.
机译:许多应用程序会生成大量并行文档集合。并行文档集合由两组文档组成,其中每组文档彼此对应并形成语义对(例如,服务台设置中的问题和解决方案描述对)。尽管在文本挖掘方面已经完成了很多工作,但是以前的工作很少尝试挖掘这种新颖的文本数据。在本文中,我们提出了一种新的概率主题模型,称为概率主题映射(PTM)模型,以挖掘并行文档集合,以同时发现两组文档中的潜在主题以及一组主题与该主题中的主题的映射。其他。我们在IT服务域中的并行文档集合上评估PTM模型。我们证明PTM可以有效地发现有意义的主题及其映射,并且在词汇量不足时,它对于改善文本匹配和检索也很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号