首页> 外文期刊>Distributed and Parallel Databases >Efficient range query processing in metric spaces over highly distributed data
【24h】

Efficient range query processing in metric spaces over highly distributed data

机译:高度分布的数据在度量空间中的有效范围查询处理

获取原文
获取原文并翻译 | 示例

摘要

Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient processing of range queries in metric spaces, where data is horizontally distributed across a super-peer network. Our approach relies on SIMPEER (Doulkeridis et al. in Proceedings of VLDB, pp. 986-997, 2007), a framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level. SIMPEER allows the evaluation of exact range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. In this paper, we extend SIMPEER by focusing on efficient range query processing and providing recall-based guarantees for the quality of the result retrieved so far. This is especially useful for range queries that lead to result sets of high cardinality and incur high processing costs, while the complete result set becomes overwhelming for the user. Our framework employs statistics for estimating an upper limit of the number of possible results for a range query and each super-peer may decide not to propagate further the query and reduce the scope of the search. We provide an experimental evaluation of our framework and show that our approach performs efficiently, even in the case of high degree of distribution.
机译:P2P系统中的相似性搜索最近引起了很多关注,并且一些重要的应用(如分布式图像搜索)可以从提出的分布式算法中受益。在本文中,我们解决了在度量空间中有效处理范围查询的挑战性问题,在度量空间中,数据在超级对等网络中水平分布。我们的方法依赖于SIMPEER(Doulkeridis等人,在VLDB会议录中,第986-997页,2007年),该框架可动态聚类对等数据,以便在超级对等层构建分布式路由信息。 SIMPEER允许以分布式方式评估准确范围和最近邻居查询,从而降低每个对等端的通信成本,网络等待时间,带宽消耗和计算开销。在本文中,我们通过重点关注有效的范围查询处理并为到目前为止检索到的结果的质量提供基于召回的保证来扩展SIMPEER。这对于导致高基数结果集并招致高处理成本的范围查询特别有用,而整个结果集对于用户来说是不堪重负的。我们的框架使用统计信息来估计范围查询的可能结果数的上限,并且每个超级节点都可以决定不进一步传播查询并缩小搜索范围。我们提供了对我们框架的实验评估,并表明即使在高度分布的情况下,我们的方法也能有效执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号