The problem of declustering, that is, how to distribute a binary Cartesian product file on multiple disks to maximize the parallelism for partial match queries, is examined. Cartesian product files appear as a result of some secondary key access methods. For the binary case, the problem is reduced to grouping the 2/sup n/ binary strings on n bits in m groups of unsimilar strings. It is proposed that the strings be grouped such that these group forms an error correcting code (ECC). This construction guarantees that the strings of a given group will have large Hamming distances, i.e., they will differ in many bit positions. Intuitively, this should result in good declustering. The authors describe how to build a declustering scheme using an ECC, and prove a theorem that gives a necessary condition for the proposed method to be optimal. Analytical results show that the proposed method is superior to older heuristics, and that it is very close to the theoretical (nontight) bound.
展开▼
机译:研究了去簇问题,即如何在多个磁盘上分配二进制笛卡尔乘积文件,以使部分匹配查询的并行度最大化。笛卡尔积文件的出现是某些辅助密钥访问方法的结果。对于二进制情况,问题被简化为在m组不相似的字符串中的n位上将2 / sup n /个二进制字符串分组。建议对字符串进行分组,以使这些分组形成纠错码(ECC)。这种构造保证了给定组的弦将具有大的汉明距离,即,它们在许多位位置上将不同。凭直觉,这将导致良好的去簇。作者描述了如何使用ECC构建解聚方案,并证明了一个定理,该定理为该方法的最佳化提供了必要条件。分析结果表明,所提出的方法优于旧的启发式算法,并且非常接近理论上的(不严格的)界限。
展开▼