首页> 外文期刊>Information Processing & Management >Updating broken web links: An automatic recommendation system
【24h】

Updating broken web links: An automatic recommendation system

机译:更新损坏的Web链接:自动推荐系统

获取原文
获取原文并翻译 | 示例
       

摘要

Broken hypertext links are a frequent problem in the Web. Sometimes the page which a link points to has disappeared forever, but in many other cases the page has simply been moved to another location in the same web site or to another one. In some cases the page besides being moved, is updated, becoming a bit different to the original one but rather similar. In all these cases it can be very useful to have a tool that provides us with pages highly related to the broken link, since we could select the most appropriate one. The relationship between the broken link and its possible linkable pages, can be defined as a function of many factors. In this work we have employed several resources both in the context of the link and in the Web to look for pages related to a broken link. From the resources in the context of a link, we have analyzed several sources of information such as the anchor text, the text surrounding the anchor, the URL and the page containing the link. We have also extracted information about a link from the Web infrastructure such as search engines, Internet archives and social tagging systems. We have combined all of these resources to design a system that recommends pages that can be used to recover the broken link. A novel methodology is presented to evaluate the system without resorting to user judgments, thus increasing the objectivity of the results, and helping to adjust the parameters of the algorithm. We have also compiled a web page collection with true broken links, which has been used to test the full system by humans.Results show that the system is able to recommend the correct page among the first ten results when the page has been moved, and to recommend highly related pages when the original one has disappeared.
机译:超文本链接断开是Web中的常见问题。有时,链接指向的页面已经永远消失了,但是在许多其他情况下,该页面只是被移动到了同一网站中的另一个位置或另一个位置。在某些情况下,该页面除了被移动外,还会被更新,与原始页面有些不同,但是非常相似。在所有这些情况下,拥有一个可以为我们提供与断开链接高度相关的页面的工具非常有用,因为我们可以选择最合适的页面。断开链接及其可能的可链接页面之间的关系可以定义为许多因素的函数。在这项工作中,我们在链接的上下文中和在Web中都使用了多种资源来查找与断开的链接有关的页面。从链接上下文中的资源中,我们分析了多种信息源,例如锚文本,锚周围的文本,URL和包含链接的页面。我们还从Web基础结构(例如搜索引擎,Internet档案和社交标签系统)中提取了有关链接的信息。我们结合了所有这些资源,以设计一个系统,该系统推荐可用于恢复断开链接的页面。提出了一种新颖的方法来评估系统,而无需依靠用户的判断,从而增加了结果的客观性,并有助于调整算法的参数。我们还编译了一个包含真正断开链接的网页集合,该集合已被人类用来测试整个系统。结果表明,该系统能够在页面移动后的前十个结果中推荐正确的页面,并且在原始页面消失后推荐高度相关的页面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号