【24h】

HadoopSPARQL: A Hadoop-Based Engine for Multiple SPARQL Query Answering

机译:HadoopSPARQL:用于多个SPARQL查询应答的基于Hadoop的引擎

获取原文

摘要

An increasing amount of data represented using Resource Description Framework (RDF) have appeared on the Semantic Web. By September 2011, datasets from Linked Open Data had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. As a consequence, it is extremely challenging to deal with the scalability issue of handling such large amount of semantic data. SPARQL is a standard query language for RDF datasets. There has been a lot of work to handle SPARQL queries. However, most of them only treat SPARQL as a transaction-based query language, and consider low latency query answering time as an important design requirement. Furthermore, the query engine processes one query at a time and concentrates on single query optimizations. Nevertheless, users can also use SPARQL in very different scenarios. For example, two users may submit queries to a dataset about publications at the same time. The first user wants to get a list containing all authors who publish at least one proceeding and at least one article while the second user wants to get a list containing all authors who publish at least one article but not necessarily publish a proceeding. Then, Query 1 and Query 2 in Fig. 1 are submitted at the same time.
机译:使用资源描述框架(RDF)表示的数据量越来越多,出现在语义Web上。到2011年9月,来自“链接开放数据”的数据集已增长到310亿个RDF三元组,并通过约5.04亿个RDF链接进行了互连。结果,处理处理大量语义数据的可伸缩性问题极具挑战性。 SPARQL是RDF数据集的标准查询语言。处理SPARQL查询有很多工作。但是,它们中的大多数仅将SPARQL视为基于事务的查询语言,并将低等待时间的查询应答时间视为重要的设计要求。此外,查询引擎一次处理一个查询,并专注于单个查询优化。但是,用户也可以在非常不同的情况下使用SPARQL。例如,两个用户可以同时向有关出版物的数据集提交查询。第一个用户想要获取包含所有发表至少一篇论文和至少一篇文章的作者的列表,而第二个用户想要获取包含所有发表至少一篇论文但不一定发布论文的作者的列表。然后,同时提交图1中的查询1和查询2。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号