首页> 外文期刊>Distributed and Parallel Databases >Efficient parallel processing of range queries through replicated declustering
【24h】

Efficient parallel processing of range queries through replicated declustering

机译:通过复制分簇来有效并行处理范围查询

获取原文
获取原文并翻译 | 示例

摘要

A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.
机译:在数据密集型应用程序中用于最小化I / O的常用技术是并行服务器上的数据分簇。该技术涉及在多个磁盘之间分配数据,以并行化查询检索,从而提高性能。我们专注于优化对大型空间数据的访问以及对此类数据的最常见查询类型,即范围查询。最佳的分簇方案是在所有可用磁盘之间对所有范围查询的处理进行均匀平衡的方案。已经显示,对于范围查询,基于单副本的分簇方案不是最佳的。在本文中,我们将复制与并行磁盘分簇结合在一起,以有效地处理范围查询。我们注意到复制在数据库应用程序中主要用于多种目的,例如负载平衡,容错和数据可用性。我们为复制分簇提供了理论基础,并提出了一类复制分簇方案,周期性分配,这些方案对于许多磁盘来说都是严格最优的。我们提出了一个使用有限数量的复制的复制聚簇框架,并提供了扩展以将其应用于实际数据,包括任意网格和大量磁盘。我们的框架还提供了一种有效的索引方案,可以快速识别并行服务器中感兴趣的数据。除了对单个查询的最佳处理之外,我们还证明了该框架对于并行处理多个查询是有效的。我们提供的实验结果将拟议的复制方案与其他技术进行了比较,这些技术针对合成数据集和真实数据集都适用于单个查询和多个查询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号