首页> 外文期刊>International journal on digital libraries >Moved but not gone: an evaluation of real-time methods for discovering replacement web pages
【24h】

Moved but not gone: an evaluation of real-time methods for discovering replacement web pages

机译:感动但并未消失:对发现替换网页的实时方法的评估

获取原文
获取原文并翻译 | 示例
       

摘要

Inaccessible Web pages and 404 "Page Not Found" responses are a common Web phenomenon and a detriment to the user's browsing experience. The rediscovery of missing Web pages is, therefore, a relevant research topic in the digital preservation as well as in the Information Retrieval realm. In this article, we bring these two areas together by analyzing four content- and link-based methods to rediscover missing Web pages. We investigate the retrieval performance of the methods individually as well as their combinations and give an insight into how effective these methods are over time. As the main result of this work, we are able to recommend not only the best performing methods but also the sequence in which they should be applied, based on their performance, complexity required to generate them, and evolution over time. Our least complex single method results in a rediscovery rate of almost 70 % of Web pages of our sample dataset based on URIs sampled from the Open Directory Project (DMOZ). By increasing the complexity level and combining three different methods, our results show an increase of the success rate of up to 77 %. The results, based on our sample dataset, indicate that Web pages are often not completely lost but have moved to a different location and "just" need to be rediscovered.
机译:不可访问的网页和404“找不到页面”响应是一种常见的Web现象,并有损于用户的浏览体验。因此,丢失网页的重新发现是数字保存以及信息检索领域中的一个相关研究主题。在本文中,我们通过分析四种基于内容和链接的方法来重新发现丢失的Web页面,将这两个领域结合在一起。我们调查了这些方法的单独及其组合的检索性能,并深入了解了这些方法随着时间的推移如何有效。作为这项工作的主要结果,我们不仅可以根据性能,生成它们所需的复杂性以及随着时间的推移,推荐性能最佳的方法,还可以建议应用这些方法的顺序。根据从开放目录项目(DMOZ)采样的URI,我们最简单的单一方法导致我们的采样数据集的网页的重新发现率几乎达到70%。通过增加复杂性级别并结合三种不同的方法,我们的结果表明成功率提高了77%。根据我们的样本数据集得出的结果表明,网页通常不会完全丢失,而是已经移至其他位置,因此需要“重新发现”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号