首页> 外文期刊>ACM transactions on database systems >Evaluating Top-k Queries Over Web-Accessible Databases
【24h】

Evaluating Top-k Queries Over Web-Accessible Databases

机译:通过Web可访问的数据库评估前k个查询

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.
机译:对Web搜索引擎的查询通常由关键字列表组成,搜索引擎用关键字的最佳或“前” k个页面对关键字进行响应。通常,此top-k查询模型普遍存在于多媒体集合中,但对于某些应用程序,也普遍存在于普通关系数据中。例如,考虑与可用餐厅信息有关的关系,包括其位置,一个餐厅的价格范围以及整体食品等级。查询这种关系的用户可以简单地指定用户的位置和目标价格范围,并根据与用户的接近程度,与目标价格范围的匹配程度以及整体食品等级的组合,期望获得最佳的10家餐厅。出于多种原因,有效处理前k个查询具有挑战性。一个关键的原因是,在许多Web应用程序中,除了通过外部可通过Web访问的表单界面外,关系属性可能不可用,我们将不得不反复查询潜在的大量候选对象集。在本文中,我们研究如何在此设置中有效地处理前k个查询,在该设置中,用户指定目标值的属性可能由具有各种访问接口的外部自主源处理。我们提出了一种用于处理此类查询的顺序算法,但是您会发现,由于Web访问展现出高且可变的延迟,因此任何顺序的top-k查询处理策略都必然需要不必要的长查询处理时间。幸运的是,尽管可以对Web源进行并行探测,但是每个源通常可以处理并发请求,尽管这些源可能会对它们愿意接受的探测的类型和数量施加一些限制。我们采用顺序查询处理技术,并引入了一种有效的算法,该算法可以在满足源访问约束的同时,最大化源访问并行性以最小化查询响应时间。我们使用合成数据和实际可通过网络访问的数据实验性地评估了我们的技术,并表明并行算法比顺序算法可以有效得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号