首页> 外文期刊>Journal of the Association for Computing Machinery >From Discrepancy to Declustering: Near-Optimal Multidimensional Declustering Strategies for Range Queries
【24h】

From Discrepancy to Declustering: Near-Optimal Multidimensional Declustering Strategies for Range Queries

机译:从差异到聚类:范围查询的近最佳多维聚类策略

获取原文
获取原文并翻译 | 示例
           

摘要

Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of data blocks of the query stored by the scheme in any one of the disks. If |Q| is the number of data blocks in Q and M is the number of disks, then rt(Q) is at least [|Q|/M]. One way to evaluate the performance of D with respect to a set of range queries Q is to measure its additive error-the maximum difference of rt(Q) from [|Q|/M] over all range queries Q ∈ Q. In this article, we consider the problem of designing declustering schemes for uniform multidimensional data arranged in a d-dimensional grid so that their additive errors with respect to range queries are as small as possible. It has been shown that for a fixed dimension d ≥ 2, any declustering scheme on an M~d grid, a grid with length M on each dimension, will always incur an additive error with respect to range queries of Ω(log M) when d = 2 and Ω(log~((d-1)/2)M) when d > 2. Asymptotically optimal declustering schemes exist for 2-dimensional data. However, the best general upper bound known so far for the worst-case additive errors of d-dimensional declustering schemes, d ≥ 3, is O(M~(d-1)), which is large when compared to the lower bound. In this article, we propose two declustering schemes based on low-discrepancy points in d-dimensions. When d is fixed, both schemes have an additive error of O(log~(d-1)M) with respect to range queries, provided that certain conditions are satisfied: the first scheme requires that the side lengths of the grid grow at a rate polynomial in M, while the second scheme requires d ≥ 2 and M = p~t where d ≤ p ? C, C a constant, and t is a positive integer such that t(d - 1) ≥ 2. These are the first multidimensional declustering schemes with additive errors proven to be near optimal.
机译:群集方案在多个磁盘之间分配数据块,以实现并行检索。给定一个分簇方案D,它对查询Q的响应时间rt(Q)定义为该方案在任何一个磁盘中存储的查询数据块的最大数量。如果| Q |是Q中的数据块数,M是磁盘数,则rt(Q)至少为[| Q | / M]。评估D对一组范围查询Q的性能的一种方法是测量其加性误差-在所有范围查询Q∈Q上rt(Q)与[| Q | / M]的最大差。在本文中,我们考虑了为在d维网格中排列的统一多维数据设计解聚方案的问题,以使它们在范围查询方面的附加误差尽可能小。已经证明,对于固定维数d≥2,当M〜d网格上的任何解簇方案(每个维上长度为M的网格)在Ω(log M)的范围查询时总是会产生加性误差当d> 2时,d = 2且为Ω(log〜((d-1)/ 2)M)。对于二维数据,存在渐近最优解聚方案。但是,到目前为止,对于d维聚簇方案的最坏情况加法误差d≥3而言,已知的最佳一般上限为O(M〜(d-1)),与下限相比较大。在本文中,我们提出了两种基于d维低差异点的去聚方案。当d为固定值时,如果满足某些条件,则两种方案的距离查询的相加误差均为O(log〜(d-1)M):第一种方案要求网格的边长以a率多项式以M为单位,而第二种方案要求d≥2且M = p〜t,其中d≤p? C,C为常数,t为正整数,使得t(d-1)≥2。这是第一个多维去簇方案,其加法误差被证明接近最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号