Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

机译：使用Apache Spark的高效分布式RDF查询的数据分区方案

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on in-memory processing engine in a distributed environment.

机译：语义数据以资源描述框架（RDF）的三倍快速增长，需要一种有效，可扩展的分布式存储和并行处理策略，以及对其管理和重用的高可用性和容错能力。分布式RDF数据管理系统存在三个未解决的问题，在现有工作中并未完全解决。首先是查询效率，其次，解决方案针对某些类型的查询模式进行了优化，不一定适用于所有类型的查询模式，第三是与减少预处理和数据加载时间有关。为了解决这些问题，我们为RDF数据提出了一种称为子集属性表（SPT）的关系分区方案，该方案将现有的“属性表”方法进一步划分为表的子集，以最大程度地减少查询输入和联接操作。我们将SPT与另一个用于存储RDF数据集的现有模型垂直分区（VP）相结合，并证明了我们提出的组合（SPT + VP）方法在分布式环境中的性能优于基于内存处理引擎的最新系统。

著录项

来源
《IEEE International Conference on Semantic Computing》|2019年|24-31|共8页
会议地点
作者
Mahmudul Hassan; Srividya K. Bansal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Resource description framework; Sparks; Distributed databases; Semantics; Loading; Engines;

机译：资源描述框架;火花;分布式数据库;语义;加载;引擎;
入库时间 2022-08-26 13:53:16

相似文献

外文文献
中文文献
专利

1. RDF packages: a scheme for efficient reasoning and querying over large-scale RDF data [J] . Shohei Ohsawa, Toshiyuki Amagasa, Hiroyuki Kitagawa International journal of web information systems . 2012,第2期

机译：RDF软件包：一种用于对大型RDF数据进行有效推理和查询的方案
2. A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark [J] . Behrooz Hosseini, Kourosh Kiani Symmetry . 2018,第8期

机译：基于Apache Spark的基于自适应密度分区的鲁棒分布式大数据聚类
3. A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data [J] . Fei Hu, Chaowei Yang, Yongyao Jiang, International journal of digital Earth . 2020,第1a3期

机译：用HDFS优化Apache Spark的分层索引策略，以有效地查询大地理空间栅格数据
4. Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark [C] . Mahmudul Hassan, Srividya K. Bansal IEEE International Conference on Semantic Computing . 2019

机译：高效分布式RDF使用Apache Spark的数据分区方案
5. Distributed RDF query processing and reasoning for Big Data Linked Data. [D] . Perasani, Anudeep. 2014

机译：大数据链接数据的分布式RDF查询处理和推理。
6. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases [O] . Hirokazu Chiba, Ikuo Uchiyama 2017

机译：SPANG：SPARQL客户端支持生成和重用分布式RDF数据库的查询
7. Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud [O] . Kisung Lee, Ling Liu, Yuzhe Tang, 2015

机译：云中分布式大型RDF数据处理的高效可定制数据分区框架

Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅