Dynamic Data Exchange in Distributed RDF Stores

Anthony Potter; Boris Motik; Yavor Nenov; Ian Horrocks

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Dynamic Data Exchange in Distributed RDF Stores

【24h】

Dynamic Data Exchange in Distributed RDF Stores

机译：分布式RDF存储中的动态数据交换

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

When RDF datasets become too large to be managed by centralised systems, they are often distributed in a cluster of shared-nothing servers, and queries are answered using a distributed join algorithm. Although such solutions have been extensively studied in relational and RDF databases, we argue that existing approaches exhibit two drawbacks. First, they usually decide statically(i.e., at query compile time) how to shuffle the data, which can lead to missed opportunities for local computation. Second, they often materialise large intermediate relations whose size is determined by the entire dataset (and not the data stored in each server), so these relations can easily exceed the memory of individual servers. As a possible remedy, we present a novel distributed join algorithm for RDF. Our approach decides when to shuffle data dynamically, which ensures that query answers that can be wholly produced within a server involve only local computation. It also uses a novel flow control mechanism to ensure that every query can be answered even if each server has a bounded amount of memory that is much smaller than the intermediate relations. We complement our algorithm with a new query planning approach that balances the cost of communication against the cost of local processing at each server. Moreover, as in several existing approaches, we distribute RDF data using graph partitioning so as to maximise local computation, but we refine the partitioning algorithm to produce more balanced partitions. We show empirically that our techniques can outperform the state of the art by orders of magnitude in terms of query evaluation times, network communication, and memory use. In particular, bounding the memory use in individual servers can mean the difference between success and failure for answering queries with large answer sets.

机译：当RDF数据集变得太大而无法由集中式系统管理时，它们通常分布在无共享服务器的群集中，并且使用分布式联接算法来回答查询。尽管已经在关系数据库和RDF数据库中广泛研究了此类解决方案，但我们认为现有方法存在两个缺点。首先，他们通常会静态决定（即在查询编译时）如何对数据进行混洗，这可能会导致丢失本地计算的机会。其次，它们通常会实现大型的中间关系，其大小由整个数据集（而不是每个服务器中存储的数据）确定，因此这些关系很容易超过单个服务器的内存。作为一种可能的解决方法，我们提出了一种新颖的RDF分布式联接算法。我们的方法决定何时动态地随机整理数据，以确保可以在服务器内完全生成的查询答案仅涉及本地计算。它还使用一种新颖的流控制机制来确保即使每个服务器都具有比中间关系小得多的有限内存量，也可以回答每个查询。我们用一种新的查询计划方法来补充我们的算法，该方法可以在每台服务器的通信成本与本地处理成本之间取得平衡。此外，如同在几种现有方法中一样，我们使用图分区来分配RDF数据以最大化本地计算，但是我们改进了分区算法以产生更平衡的分区。我们从经验上证明，在查询评估时间，网络通信和内存使用方面，我们的技术可以比现有技术好几个数量级。特别是，限制单个服务器中的内存使用量可能意味着在回答具有较大答案集的查询时成功与失败之间的区别。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2018年第12期|2312-2325|共14页
作者
Anthony Potter; Boris Motik; Yavor Nenov; Ian Horrocks;
展开▼
作者单位

Department of Computer Science, University of Oxford, Oxford, United Kingdom;

Department of Computer Science, University of Oxford, Oxford, United Kingdom;

Department of Computer Science, University of Oxford, Oxford, United Kingdom;

Department of Computer Science, University of Oxford, Oxford, United Kingdom;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Servers; Resource description framework; Query processing; Heuristic algorithms; Partitioning algorithms; Query processing;

机译：服务器;资源描述框架;查询处理;启发式算法;分区算法;查询处理;

相似文献

外文文献
中文文献
专利

1. Impact analysis of data placement strategies on query efforts in distributed RDF stores [J] . Janke Daniel, Staab Steffen, Thimm Matthias Journal of web semantics: . 2018,第MAY期

机译：数据放置策略对分布式RDF存储中查询工作的影响分析
2. Dynamic Partitioning Supporting Load Balancing for Distributed RDF Graph Stores [J] . Kyoungsoo Bok, Junwon Kim, Jaesoo Yoo Symmetry . 2019,第7期

机译：支持分布式RDF图存储的动态分区支持负载平衡
3. Efficient querying of multidimensional RDF data with aggregates: Comparing NoSQL, RDF and relational data stores [J] . Ravat Franck, Song Jiefu, Teste Olivier, International Journal of Information Management . 2020,第Octa期

机译：高效查询聚集体的多维RDF数据：比较NoSQL，RDF和关系数据存储
4. Datalog Materialisation in Distributed RDF Stores with Dynamic Data Exchange [C] . Temitope Ajileye, Boris Motik, Ian Horrocks International semantic web conference . 2019

机译：通过动态数据交换在分布式RDF存储中实现数据记录
5. Scalability of commercial database management systems as RDF stores. [D] . Atal, Nikita Shyamsunder. 2012

机译：商业数据库管理系统作为RDF存储的可伸缩性。
6. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases [O] . Hirokazu Chiba, Ikuo Uchiyama 2017

机译：SPANG：SPARQL客户端支持生成和重用分布式RDF数据库的查询
7. Datalog Materialisation in Distributed RDF Stores with Dynamic Data Exchange [O] . Temitope Ajileye, Boris Motik, Ian Horrocks 2019

机译：具有动态数据交换的分布式RDF商店中的Datalog Matheratation

Dynamic Data Exchange in Distributed RDF Stores

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅