【24h】

Intelligent Web Navigation

机译:智能网络导航

获取原文

摘要

Virtual integration systems retrieve information according to the user's interest. This information is retrieved from several web applications, but it is presented to the user uniformly, in an online process. Therefore, response time is a significant factor. An essential part of any information retrieval system is navigation through pages. Usually web pages contain a high number of links, some of them leading to interesting information, but most of them having other purposes, like advertising or internal site navigation. Traditional crawlers follow every link in each page, in order to analyze the target page, and classify it as interesting or irrelevant. This means having to retrieve, analyze and classify thousands of pages for every single site, which is a costly task. This problem can be solved with the combination of a web page classifier, to distinguish between interesting and irrelevant pages, and a link classifier, which automatically identifies links leading to interesting pages. This kind of navigation is more efficient and has a lower cost than traditional crawlers. Moreover, navigation model is automatically extracted from the site, instead of being handcrafted, reducing the supervision from the user.
机译:虚拟集成系统根据用户的兴趣检索信息。从多个Web应用程序中检索此信息,但在在线过程中,它均匀地呈现给用户。因此,响应时间是一个重要因素。任何信息检索系统的重要部分是通过页面导航的。通常网页包含大量链接,其中一些导致有趣的信息,但大多数都有其他目的,如广告或内部站点导航。传统的爬虫在每个页面中的每个链接都遵循每个页面,以分析目标页面,并将其分类为有趣或无关紧要。这意味着必须为每个站点检索,分析和分类数千页,这是一个昂贵的任务。该问题可以用网页分类器的组合来解决,以区分有趣和无关的页面和链接分类器,它自动识别导致有趣页面的链接。这种导航更有效,并且具有比传统爬行者的成本更低。此外,导航模型从站点自动提取,而不是被手工制作,从而减少用户的监督。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号