System II: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model

Gang Wu; Juan-Zi Li; Jian-Qiang Hu; Ke-Hong Wang

摘要

RDF is the data interchange layer for the Semantic Web. In order to manage the increasing amount of RDF data, an RDF repository should provide not only the necessary scalability and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System II, to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System II takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the pD* semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cacheare introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System II has a better combined metric value than other comparable systems.

机译：RDF是语义Web的数据交换层。为了管理越来越多的RDF数据，RDF存储库不仅应提供必要的可扩展性和效率，而且还提供足够的推理能力。虽然现有的RDF存储库已经取得了这些目标，但仍有充足的空间来提高整体性能。在本文中，我们提出了一个天然的RDF存储库，系统II，以在系统可伸缩性，查询效率和推理功能之间进行更好的折衷。系统II为RDF作为其持久存储的数据模型进行RDF，从而有效地避免了访问RDF数据时数据模型转换的成本。基于此本机存储方案，设计了一组有效的语义查询处理技术。首先，建立几个指数以加速RDF数据访问，包括值索引，用于传递闭合计算的标签方案和三个三重指标。其次，我们在PD *语义下提出了一个混合推理策略，以支持猫头鹰Lite的推断，计算复杂性相对较低。最后，我们通过定义一些新的代数运算符来扩展SPARQL代数以在逻辑查询计划中显式快速表达推理语义。此外，URI和架构级别Cacheare的MD5散列值作为实用实现技术引入。 LUBM基准测试和实际数据集的性能评估结果表明，系统II具有比其他可比系统更好的组合度量值。

System II: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model

摘要

著录项

相关主题

期刊订阅