首页> 外文学位 >Automatic discovery and selection of text resources on the Web, towards building a very large-scale and effective metasearch engine, Webscales.
【24h】

Automatic discovery and selection of text resources on the Web, towards building a very large-scale and effective metasearch engine, Webscales.

机译:自动发现和选择Web上的文本资源,以构建非常大规模和有效的元搜索引擎Webscales。

获取原文
获取原文并翻译 | 示例

摘要

A metasearch engine is a system that supports unified access to multiple component search engines. We are exploring a complete set of technologies to enable building a very large-scale metasearch engine that can access up to hundreds of thousands component search engines.; One major challenge is to identify search engines, collect and maintain representative information from them. The problem is to find search engines from the Web, build wrappers for search engines to enable automatic sending of queries and extraction of feature information of search engines in a highly effective manner, because of the huge number of search engines involved. A set of corresponding techniques is developed and designed to achieve accurate search engine wrapping. These techniques are highly automatic, efficient and accurate.; Database selection is another major challenge in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small number of potentially useful component search engines to invoke for each user query. In order to enable accurate selection, metadata that reflect the contents of each search engine need to be collected and used. In this dissertation, a highly scalable and accurate database selection method is proposed. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computation efficiency and storage efficiency. Second, our selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very effective.; Furthermore, using techniques described in this dissertation also makes the construction of metasearch engines easy because of their high degree of automation. A metasearch system can be built on top of a set of search engines easily with a small amount of human involvement with the developed techniques.
机译:元搜索引擎是一种支持对多个组件搜索引擎进行统一访问的系统。我们正在探索一套完整的技术,以构建一个超大型元搜索引擎,该引擎可以访问多达数十万个组件搜索引擎。一个主要的挑战是识别搜索引擎,从中收集并维护代表信息。问题在于,由于涉及到大量的搜索引擎,因此可以从Web上找到搜索引擎,为搜索引擎构建包装器,以实现以高效方式自动发送查询和提取搜索引擎的特征信息。开发和设计了一组相应的技术来实现精确的搜索引擎包装。这些技术是高度自动化,高效和准确的。数据库选择是构建大型元搜索引擎的另一个主要挑战。问题是要有效,准确地确定少量可能有用的组件搜索引擎,以为每个用户查询调用。为了能够进行准确的选择,需要收集和使用反映每个搜索引擎内容的元数据。本文提出了一种高度可扩展,准确的数据库选择方法。该方法具有几个新颖的特征。首先,用于表示所有搜索引擎内容的元数据被组织为一个集成的代表。这样的代表产生了计算效率和存储效率。其次,我们的选择方法基于一种理论,可以对搜索引擎进行最佳排名。实验结果表明,该新方法非常有效。此外,由于其高度的自动化,使用本文描述的技术还使元搜索引擎的构建变得容易。元搜索系统可以很容易地建立在一组搜索引擎之上,而开发人员只需花费很少的人力即可。

著录项

  • 作者

    Wu, Zonghuan.;

  • 作者单位

    State University of New York at Binghamton.;

  • 授予单位 State University of New York at Binghamton.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 102 p.
  • 总页数 102
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号