...
首页> 外文期刊>Information retrieval >Blog feed search with a post index
【24h】

Blog feed search with a post index

机译:带帖子索引的博客供稿搜索

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

User generated content forms an important domain for mining knowledge. In this paper, we address the task of blog feed search: to find blogs that are principally devoted to a given topic, as opposed to blogs that merely happen to mention the topic in passing. The large number of blogs makes the blogosphere a challenging domain, both in terms of effectiveness and of storage and retrieval efficiency. We examine the effectiveness of an approach to blog feed search that is based on individual posts as indexing units (instead of full blogs). Working in the setting of a probabilistic language modeling approach to information retrieval, we model the blog feed search task by aggregating over a blogger's posts to collect evidence of relevance to the topic and persistence of interest in the topic. This approach achieves state-of-the-art performance in terms of effectiveness. We then introduce a two-stage model where a pre-selection of candidate blogs is followed by a ranking step. The model integrates aggressive pruning techniques as well as very lean representations of the contents of blog posts, resulting in substantial gains in efficiency while maintaining effectiveness at a very competitive level.
机译:用户生成的内容构成了挖掘知识的重要领域。在本文中,我们解决了博客供稿搜索的任务:查找主要致力于给定主题的博客,而不是仅仅偶然提及该主题的博客。无论是从有效性还是在存储和检索效率方面,大量的博客使Blogosphere成为具有挑战性的领域。我们检查基于单个帖子作为索引单位(而不是完整博客)的博客提要搜索方法的有效性。在设置一种用于信息检索的概率语言建模方法的过程中,我们通过汇总博客作者的帖子以收集与该主题相关的主题以及对该主题的兴趣持续存在的证据,来对博客摘要搜索任务进行建模。就有效性而言,此方法可实现最先进的性能。然后,我们引入一个两阶段模型,其中候选博客的预选之后是排名步骤。该模型集成了积极的修剪技术以及博客文章内容的精益表示形式,从而在提高效率的同时大幅提高了效率,同时保持了非常竞争的水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号