首页> 外文会议>Information Retrieval Technology >News Page Discovery Policy for Instant Crawlers
【24h】

News Page Discovery Policy for Instant Crawlers

机译:即时搜寻者的新闻页面发现政策

获取原文

摘要

Many news pages which are of high freshness requirements are published on the internet every day. They should be downloaded immediately by instant crawlers. Otherwise, they will become outdated soon. In the past, instant crawlers only downloaded pages from a manually generated news website list. Bandwidth is wasted in downloading non-news pages because news websites do not publish news pages exclusively. In this paper, a novel approach is proposed to discover news pages. This approach includes seed selection and news URL prediction based on user behavior analysis. Empirical studies in a user access log for two months show that our approach outperforms the traditional approach in both precision and recall.
机译:每天都在互联网上发布许多对新鲜度要求很高的新闻页面。即时搜寻器应立即下载它们。否则,它们将很快过时。过去,即时搜寻器仅从手动生成的新闻网站列表中下载页面。带宽浪费在下载非新闻页面上,因为新闻网站不专门发布新闻页面。本文提出了一种新颖的发现新闻页面的方法。该方法包括基于用户行为分析的种子选择和新闻URL预测。在两个月的用户访问日志中的经验研究表明,我们的方法在准确性和召回率方面均优于传统方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号