首页> 外文会议>IEEE World Congress on Services >Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase

【24h】

Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase

机译：在Apache HBase中将大型来源数据集作为RDF图存储，索引和查询

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Provenance, which records the history of an in-silico experiment, has been identified as an important requirement for scientific workflows to support scientific discovery reproducibility, result interpretation, and problem diagnosis. Large provenance datasets are composed of many smaller provenance graphs, each of which corresponds to a single workflow execution. In this work, we explore and address the challenge of efficient and scalable storage and querying of large collections of provenance graphs serialized as RDF graphs in an Apache HBase database. Specifically, we propose: (i) novel storage and indexing techniques for RDF data in HBase that are better suited for provenance datasets rather than generic RDF graphs and (ii) novel SPARQL query evaluation algorithms that solely rely on indices to compute expensive join operations, make use of numeric values that represent triple positions rather than actual triples, and eliminate the need for intermediate data transfers over a network. The empirical evaluation of our algorithms using provenance datasets and queries of the University of Texas Provenance Benchmark confirms that our approach is efficient and scalable.

机译：记录计算机模拟实验历史的出处已被确定为科学工作流程的重要要求，以支持科学发现的可重复性，结果解释和问题诊断。大型出处数据集由许多较小的出处图组成，每个图都对应于一个工作流程执行。在这项工作中，我们探索并解决了高效且可扩展的存储和查询在Apache HBase数据库中序列化为RDF图的大量出处图集合的挑战。具体来说，我们建议：（i）HBase中用于RDF数据的新颖存储和索引技术，比通用RDF图更适合于来源数据集，并且（ii）仅依靠索引来计算昂贵的联接操作的新颖SPARQL查询评估算法，利用代表三重位置而不是实际三重位置的数值，并消除了通过网络进行中间数据传输的需要。使用出处数据集和德克萨斯大学出处基准测试查询对我们的算法进行的经验评估证实，我们的方法是有效且可扩展的。

著录项

来源
《IEEE World Congress on Services》|2013年|1-8|共8页
会议地点
作者
Chebotko Artem; Abraham John; Brazier Pearl; Piazza Anthony;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
HBase; RDF; SPARQL; big data; distributed database; provenance; query; scalability; scientific workflow;

机译：HBase; RDF; SPARQL;大数据;分布式数据库;来源;查询;可扩展性;科学的工作流程;

相似文献

外文文献
中文文献
专利

1. Storing and querying fuzzy RDF(S) in HBase databases [J] . Tianyi Fan, Li Yan, Zongmin Ma International journal of entelligent systems . 2020,第4期

机译：在HBase数据库中存储和查询模糊RDF（S）
2. Efficient querying of multidimensional RDF data with aggregates: Comparing NoSQL, RDF and relational data stores [J] . Ravat Franck, Song Jiefu, Teste Olivier, International Journal of Information Management . 2020,第Octa期

机译：高效查询聚集体的多维RDF数据：比较NoSQL，RDF和关系数据存储
3. Temporal RDF(S) Data Storage and Query with HBase [J] . Li Yan, Zheqing Zhang, Dan Yang Journal of Computing and Information Technology . 2019,第4期

机译：时间rdf（s）数据存储和查询hbase
4. Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase [C] . Artem Chebotko, John Abraham, Pearl Brazier, IEEE World Congress on Services . 2013

机译：存储，索引和查询大型出处数据集作为Apache HBase中的RDF图形
5. A study of graph partitioning techniques for fast indexing and query processing of a large RDF graph [D] . Barenkala, Dinesh 2013

机译：用于大型RDF图的快速索引和查询处理的图分区技术研究
6. Quadrant-Based Minimum Bounding Rectangle-Tree Indexing Method for Similarity Queries over Big Spatial Data in HBase [O] . Bumjoon Jo, Sungwon Jung 2018

机译：HBase中大空间数据相似性查询的基于象限的最小边界矩形树索引方法
7. Storing and Indexing Massive RDF Data Sets [O] . Yongming Luo, George H. L. Fletcher, Jan Hidders, 2014

机译：存储和索引大规模RDF数据集

Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase

摘要

著录项

相似文献

相关主题

期刊订阅