首页> 外文会议>Proceedings of the Tenth international conference on Information and knowledge management >Predicting the cost-quality trade-off for information retrieval queries
【24h】

Predicting the cost-quality trade-off for information retrieval queries

机译:预测信息检索查询的成本质量权衡

获取原文

摘要

Efficient, flexible, and scalable integration of full text information retrieval (IR) in a DBMS is not a trivial case. This holds in particular for query optimization in such a context. To facilitate the bulk-oriented behavior of database query processing, a priori knowledge of how to limit the data efficiently prior to query evaluation is very valuable at optimization time. The usually imprecise nature of IR querying provides an extra opportunity to limit the data by a trade-off with the quality of the answer. In this paper we present a mathematically derived model to predict the quality implications of neglecting information before query execution. In particular we investigate the possibility to predict the retrieval quality for a document collection for which no training information is available, which is usually the case in practice. Instead, we construct a model that can be trained on other document collections for which the necessary quality information is available, or can be obtained quiteeasily. We validate our model for several document collections and present the experimental results. These results show that our model performs quite well, even for the case were we did not train it on the test collection itself.
机译:在DBMS中高效,灵活和可扩展地集成全文信息检索(IR)并不是一件容易的事。对于这种情况下的查询优化尤其如此。为了促进数据库查询处理的面向批量的行为,在优化时,关于如何在查询评估之前有效地限制数据的先验知识非常有价值。 IR查询的通常不精确的性质提供了一个额外的机会,可以通过权衡答案的质量来限制数据。在本文中,我们提出了一个数学推导的模型来预测查询执行之前忽略信息的质量含义。特别是,我们研究了预测没有培训信息可用的文档集的检索质量的可能性,这在实践中通常是这种情况。取而代之的是,我们构建了一个模型,该模型可以在可以获得必要质量信息或可以很容易地获得其质量信息的其他文档集合上进行训练。我们验证了几个文档集合的模型,并提供了实验结果。这些结果表明,即使对于我们没有在测试集合本身上进行训练的情况,我们的模型也表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号