Selective Replicated Declustering for Arbitrary Queries

机译：任意查询的选择性复制聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes.

机译：数据分簇用于在数据密集型应用程序中最大程度地缩短查询响应时间。在这种技术中，查询检索过程是通过在几个磁盘之间分配数据来并行化的，它在诸如访问大量数据的地理信息系统之类的应用程序中很有用。通过复制进行分簇是对系统中可能的数据副本进行分簇的扩展。已经提出了许多复制的去簇方案。这些方案中的大多数会生成所有数据项的两个或更多副本。但是，某些应用程序具有非常大的数据大小，甚至所有数据项都具有两个副本可能也不可行。在这样的系统中，选择性复制是必要的。此外，如果现有复制方案不可用，则不能将其设计为利用查询分布信息。在这项研究中，我们提出了一种复制的分簇方案，该方案既可以确定要复制的数据项，又可以在复制容量有限时决定将所有数据项分配给磁盘。我们利用可用的查询信息来决定数据的复制和分区，并尝试优化汇总并行响应时间。我们提出并实现了类似于Fiduccia-Mattheyses的迭代改进算法，以获取双向复制聚簇，并在递归框架中使用该算法来生成多路复制聚簇。对真实数据集进行任意查询的实验表明，与现有的复制分簇方案相比，特别是对于低复制约束条件，该方案产生了更好的性能结果。

著录项

来源
《Euro-par 2009 parallel processing》|2009年|375-386|共12页
会议地点 Delft(NL);Delft(NL)
作者
K. Yasin Oktay; Ata Turk; Cevdet Aykanat;
展开▼
作者单位

Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey;

rnBilkent University, Department of Computer Engineering, 06800, Ankara, Turkey;

rnBilkent University, Department of Computer Engineering, 06800, Ankara, Turkey;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类分布式操作系统、并行式操作系统;
关键词

相似文献

外文文献
中文文献
专利

1. Query-Log Aware Replicated Declustering [J] . Turk Ata, Yasin Oktay Kerim, Aykanat Cevdet Parallel and Distributed Systems, IEEE Transactions on . 2013,第5期

机译：查询日志感知复制聚类
2. Efficient parallel processing of range queries through replicated declustering [J] . Hakan Ferhatosmanoglu, Ali Saman Tosun, Guadalupe Canahuate, Distributed and Parallel Databases . 2006,第2期

机译：通过复制分簇来有效并行处理范围查询
3. From Discrepancy to Declustering: Near-Optimal Multidimensional Declustering Strategies for Range Queries [J] . Chung-Min Chen, Christine T. Cheng Journal of the Association for Computing Machinery . 2004,第1期

机译：从差异到聚类：范围查询的近最佳多维聚类策略
4. Selective Replicated Declustering for Arbitrary Queries [C] . K. Yasin Oktay, Ata Turk, Cevdet Aykanat International Conference on Parallel Computing . 2009

机译：任意查询的选择性复制过度
5. Query processing in spatial database systems: Declustering and clustering techniques. [D] . Ravada, Sivakumar. 1997

机译：空间数据库系统中的查询处理：聚类和聚类技术。
6. Geneshot: search engine for ranking genes from arbitrary text queries [O] . Alexander Lachmann, Brian M Schilder, Megan L Wojciechowicz, 2019

机译：Geneshot：用于对任意文本查询中的基因进行排名的搜索引擎
7. Selective Replicated Declustering for Arbitrary Queries [O] . K. Yasinoktay Ataturk Andcevdetaykanat 2013

机译：任意查询的选择性复制去聚集

Selective Replicated Declustering for Arbitrary Queries

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅