首页> 外文会议>International Conference on Theory and Practice of Digital Libraries >Profiling Web Archive Coverage for Top-Level Domain and Content Language
【24h】

Profiling Web Archive Coverage for Top-Level Domain and Content Language

机译:分析顶级域和内容语言的Web归档覆盖范围

获取原文

摘要

The Memento aggregator currently polls every known public web archive when serving a request for an archived web page, even though some web archives focus on only specific domains and ignore the others. Similar to query routing in distributed search, we investigate the impact on aggregated Memento TimeMaps (lists of when and where a web page was archived) by only sending queries to archives likely to hold the archived page. We profile twelve public web archives using data from a variety of sources (the web, archives' access logs, and full-text queries to archives) and discover that only sending queries to the top three web archives (i.e., a 75% reduction in the number of queries) for any request produces the full TimeMaps on 84% of the cases.
机译:Memento Aggregator目前在为存档网页的请求提供请求时对每个已知的公共网络归档进行调查,即使某些Web档案只关注特定域并忽略其他域。类似于在分布式搜索中的查询路由,我们通过仅向可能包含归档页面的归档,调查对聚合Memento TimeMaps(何时返回的何时归档的何时归档的何时返回Web页面的列表)。我们将十二个公共网络归档使用来自各种源(Web,归档'访问日志以及归档的全文查询)进行配置文件,并发现仅向前三个Web档案发送查询(即,减少75%任何请求的查询数量会在84%的情况下产生完整的时间映射。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号