首页> 外文会议>Euro-par 2009 parallel processing >Selective Replicated Declustering for Arbitrary Queries
【24h】

Selective Replicated Declustering for Arbitrary Queries

机译:任意查询的选择性复制聚类

获取原文
获取原文并翻译 | 示例

摘要

Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes.
机译:数据分簇用于在数据密集型应用程序中最大程度地缩短查询响应时间。在这种技术中,查询检索过程是通过在几个磁盘之间分配数据来并行化的,它在诸如访问大量数据的地理信息系统之类的应用程序中很有用。通过复制进行分簇是对系统中可能的数据副本进行分簇的扩展。已经提出了许多复制的去簇方案。这些方案中的大多数会生成所有数据项的两个或更多副本。但是,某些应用程序具有非常大的数据大小,甚至所有数据项都具有两个副本可能也不可行。在这样的系统中,选择性复制是必要的。此外,如果现有复制方案不可用,则不能将其设计为利用查询分布信息。在这项研究中,我们提出了一种复制的分簇方案,该方案既可以确定要复制的数据项,又可以在复制容量有限时决定将所有数据项分配给磁盘。我们利用可用的查询信息来决定数据的复制和分区,并尝试优化汇总并行响应时间。我们提出并实现了类似于Fiduccia-Mattheyses的迭代改进算法,以获取双向复制聚簇,并在递归框架中使用该算法来生成多路复制聚簇。对真实数据集进行任意查询的实验表明,与现有的复制分簇方案相比,特别是对于低复制约束条件,该方案产生了更好的性能结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号