首页> 外文会议>IEEE International Conference on Semantic Computing >Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark
【24h】

Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

机译:使用Apache Spark的高效分布式RDF查询的数据分区方案

获取原文

摘要

The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.
机译:语义数据以资源描述框架(RDF)的三倍快速增长,需要一种有效,可扩展的分布式存储和并行处理策略,以及对其管理和重用的高可用性和容错能力。分布式RDF数据管理系统存在三个未解决的问题,在现有工作中并未完全解决。首先是查询效率,其次,解决方案针对某些类型的查询模式进行了优化,不一定适用于所有类型的查询模式,第三是与减少预处理和数据加载时间有关。为了解决这些问题,我们为RDF数据提出了一种称为子集属性表(SPT)的关系分区方案,该方案将现有的“属性表”方法进一步划分为表的子集,以最大程度地减少查询输入和联接操作。我们将SPT与另一个用于存储RDF数据集的现有模型垂直分区(VP)相结合,并证明了我们提出的组合(SPT + VP)方法在分布式环境中的性能优于基于内存处理引擎的最新系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号