首页> 外文期刊>Computer networks >Reprint of: The anatomy of a large-scale hypertextual web search engine
【24h】

Reprint of: The anatomy of a large-scale hypertextual web search engine

机译:转载:大型超文本网络搜索引擎的剖析

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from 3 years ago. This paper provides an in-depth description of our large-scale web search engine - the first such detailed public description we know of to date.Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections, where anyone can publish anything they want.
机译:在本文中,我们介绍了Google,它是大型搜索引擎的原型,该引擎大量使用了超文本中的结构。 Google旨在有效地对Web进行爬网和编制索引,并产生比现有系统更令人满意的搜索结果。拥有至少2400万页面的全文和超链接数据库的原型已经可用。设计搜索引擎是一项艰巨的任务。搜索引擎索引数以千万计的网页,涉及相当数量的不同术语。他们每天都会回答数千万条查询。尽管大型搜索引擎在网络上很重要,但对它们的学术研究却很少。此外,由于技术和网络的迅速发展,今天创建一个网络搜索引擎与3年前大不相同。本文对我们的大型Web搜索引擎进行了深入的描述-迄今为止我们所知道的第一个此类详细的公共描述。除了将传统搜索技术扩展到如此庞大的数据的问题之外,还涉及新的技术挑战使用超文本中存在的附加信息来产生更好的搜索结果。本文解决了这个问题,即如何构建一个可以利用超文本中存在的附加信息的实用的大型系统。我们还将研究如何有效处理不受控制的超文本集合的问题,任何人都可以在其中发布他们想要的任何东西。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号