首页> 外文学位 >Database selection in distributed information retrieval: A study of multi-collection information retrieval.
【24h】

Database selection in distributed information retrieval: A study of multi-collection information retrieval.

机译:分布式信息检索中的数据库选择:多馆藏信息检索的研究。

获取原文
获取原文并翻译 | 示例

摘要

The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multi-collection environment. Multi-collection searching includes distributed searching as a special case but is more broadly defined here to incorporate searching partitioned content independently from its physical storage. It is cast in three parts: collection selection (also referred to as database selection)—decide where should a query be sent; query processing—execute the query at each selected collection; and results merging—combine the results from individual collections into a single coherent list for the searcher. We focus our attention on collection selection.; We compare a number of different collection selection approaches and examine the effect of collection selection on document retrieval performance. We consider multi-collection retrieval in six different test environments utilizing three document testbeds. Considering collection selection in isolation, we find that effective collection selection can be achieved using limited information about each collection. We then turn our attention from selection alone to data item retrieval in a multi-collection environment, considering retrieval performance in the same six test environments. First we find that good collection selection has the potential to result in better retrieval effectiveness than can be achieved in an equivalent single collection. Second we find that good performance can be achieved when only a few collections are selected and that the performance generally increases as more collections are selected. Finally we find that when collection selection is employed, it may not be necessary to maintain collection wide information (CWI), e.g., global idf. Local information can be used to achieve equivalent performance. This means that multi-collection systems can be engineered with more autonomy and less cooperation. This work demonstrates that improvements in collection selection can lead to broader improvements in document retrieval performance.
机译:在线信息资源的激增增加了在多馆藏环境中有效进行信息检索的重要性。多集合搜索在特殊情况下包括分布式搜索,但在此更广义地定义为独立于其物理存储而合并搜索分区内容。它分为三个部分:集合选择(也称为数据库选择)-决定将查询发送到哪里;查询处理-在每个选定的集合处执行查询;和结果合并-将单个集合的结果合并到搜索者的单个一致列表中。我们将注意力集中在收藏选择上。我们比较了许多不同的馆藏选择方法,并研究了馆藏选择对文档检索性能的影响。我们考虑在六个不同的测试环境中利用三个文档测试平台进行多集合检索。单独考虑集合选择,我们发现可以使用关于每个集合的有限信息来实现有效的集合选择。然后,我们考虑在相同的六个测试环境中的检索性能,将注意力从单独选择转移到在多集合环境中进行数据项检索。首先,我们发现,与等效的单个集合相比,良好的集合选择有可能导致更好的检索效率。其次,我们发现仅选择几个集合即可实现良好的性能,并且随着选择更多的集合,性能通常会提高。最终,我们发现,当采用集合选择时,可能不必维护集合范围信息(CWI),例如全局IDF。本地信息可用于实现同等的性能。这意味着可以以更大的自主权和更少的协作来设计多重收集系统。这项工作表明,馆藏选择的改进可以导致文档检索性能的更广泛的改进。

著录项

  • 作者

    Powell, Allison Lane.;

  • 作者单位

    University of Virginia.;

  • 授予单位 University of Virginia.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 250 p.
  • 总页数 250
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号