首页> 外文学位 >Looking for a haystack: Selecting data sources in a distributed retrieval system.
【24h】

Looking for a haystack: Selecting data sources in a distributed retrieval system.

机译:寻找大海捞针:在分布式检索系统中选择数据源。

获取原文
获取原文并翻译 | 示例

摘要

The Internet contains billions of documents and thousands of systems for searching over these documents. Searching for a useful document can be as difficult as the proverbial search for a needle in a haystack. Each search engine provides access to a different collection of documents. Collections may be large or small, focused or comprehensive. Focused collections may be centered on any possible topic, and comprehensive collections typically have particular topical areas with higher concentrations of documents. Some of these collections overlap, but many documents are available from only a single collection. To find the most needles, one must first select the best haystacks.; This dissertation develops a framework for automatic selection of search engines. In this framework, the collection underlying each search engine is examined to determine how properties such as central topic, size, and degree of focus affect retrieval performance. When measured with appropriate techniques, these properties may be used to predict performance. A new distributed retrieval algorithm that takes advantage of this knowledge is presented and compared to existing retrieval algorithms.
机译:互联网包含数十亿个文档和数千个用于搜索这些文档的系统。搜索有用的文档可能像在大海捞针中众所周知的搜索一样困难。每个搜索引擎都提供对不同文档集合的访问。馆藏可以是大的或小的,集中的或全面的。重点收藏可能集中于任何可能的主题,而综合收藏通常具有特定的主题领域,且文档集中度较高。这些集合中有一些是重叠的,但是许多文档只能从一个集合中获得。要找到最多的针头,首先必须选择最好的干草堆。本文开发了一种自动选择搜索引擎的框架。在此框架中,将检查每个搜索引擎下面的集合,以确定诸如中心主题,大小和焦点程度之类的属性如何影响检索性能。当使用适当的技术进行测量时,这些属性可用于预测性能。提出了一种利用这种知识的新的分布式检索算法,并将其与现有的检索算法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号