首页> 外文学位 >WebCrawler: Finding what people want.
【24h】

WebCrawler: Finding what people want.

机译:WebCrawler:查找人们想要的东西。

获取原文
获取原文并翻译 | 示例

摘要

WebCrawler, the first comprehensive full-text search engine for the World-Wide Web, has played a fundamental role in making the Web easier to use for millions of people. Its invention and subsequent evolution, spanning a three-year period, helped fuel the Web's growth by creating a new way of navigating hypertext.; Before search engines like WebCrawler, users found Web documents by following hypertext links from one document to another. When the Web was small and its documents shared the same fundamental purpose, users could find documents with relative ease. However, the Web quickly grew to millions of pages making navigation difficult. WebCrawler assists users in their Web navigation by automating the task of link traversal, creating a searchable index of the web, and fulfilling searchers' queries from the index. To use WebCrawler, a user issues a query to a pre-computed index, quickly retrieving a list of documents that match the query.; This dissertation describes WebCrawler's scientific contributions: a method for choosing a subset of the Web to index; an approach to creating a search service that is easy to use; a new way to rank search results that can generate highly effective results for both naive and expert searchers; and an architecture for the service that has effectively handled a three-order-of-magnitude increase in load.; This dissertation also describes how WebCrawler evolved to accommodate the extraordinary growth of the Web. This growth affected WebCrawler not only by increasing the size and scope of its index, but also by increasing the demand for its service. Each of WebCrawler's components had to change to accommodate this growth: the crawler had to download more documents, the full-text index had to become more efficient at storing and finding those documents, and the service had to accommodate heavier demand. Such changes were not only related to scale, however: the evolving nature of the Web meant that functional changes were necessary, too, such as the ability to handle naive queries from searchers.
机译:WebCrawler是第一个用于万维网的全面的全文本搜索引擎,在使数百万人更易于使用Web方面发挥了重要作用。它的发明和随后的为期三年的演变,通过创建一种导航超文本的新方式,推动了Web的发展。在像WebCrawler这样的搜索引擎出现之前,用户通过跟踪从一个文档到另一个文档的超文本链接来找到Web文档。当Web很小并且其文档具有相同的基本目的时,用户可以相对轻松地找到文档。但是,Web迅速发展到数百万个页面,使导航变得困难。 WebCrawler通过自动执行链接遍历任务,创建Web的可搜索索引并从索引中满足搜索者的查询,来帮助用户进行Web导航。为了使用WebCrawler,用户向预先计算的索引发出查询,从而快速检索与该查询匹配的文档列表。本文描述了WebCrawler的科学贡献:一种选择要索引的Web子集的方法;创建易于使用的搜索服务的方法;一种对搜索结果进行排名的新方法,可以为天真的和专业搜索者生成高效的结果;服务的体系结构有效地处理了三个数量级的负载增加。本文还描述了WebCrawler如何发展以适应Web的飞速发展。这种增长不仅通过增加索引的大小和范围,而且还通过增加对其服务的需求而影响了WebCrawler。 WebCrawler的每个组件都必须进行更改以适应这种增长:爬虫必须下载更多文档,全文索引必须更加有效地存储和查找这些文档,并且服务必须适应更大的需求。但是,这样的变化不仅与规模有关:网络的不断发展的性质意味着功能的变化也是必要的,例如处理来自搜索者的幼稚查询的能力。

著录项

  • 作者

    Pinkerton, Brian.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2000
  • 页码 94 p.
  • 总页数 94
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号