首页> 外文会议>International conference on web information systems engineering >Topical Pattern Based Document Modelling and Relevance Ranking
【24h】

Topical Pattern Based Document Modelling and Relevance Ranking

机译:基于主题模式的文档建模和相关性排名

获取原文

摘要

For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users' interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.
机译:对于传统的信息过滤(IF)模型,通常假定一个集合中的文档仅与一个主题相关。但是,实际上,用户的兴趣可能是多种多样的,并且馆藏中的文档通常涉及多个主题。提出了主题建模来生成统计模型来表示文档集合中的多个主题,但是在主题模型中,主题是通过单词的分布来表示的,而单词的分布仅限于独特地表示主题的语义。人们总是认为模式比单个术语更具区分性,并且能够揭示单词之间的内在联系。本文提出了一种新颖的信息过滤模型,即基于显着匹配模式的主题模型(SPBTM)。 SPBTM用多个主题表示用户信息需求,每个主题用模式表示。更重要的是,根据模式的统计和分类特征将模式组织成组,从中可以识别出更具代表性的模式(称为重要匹配模式),并将其用于估计文档的相关性。在基准数据集上进行的实验表明,SPBTM明显优于最新模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号