首页> 外文会议>IEEE international conference on data engineering >Towards optimization of RDF analytical queries on MapReduce
【24h】

Towards optimization of RDF analytical queries on MapReduce

机译:在MapReduce上优化RDF分析查询

获取原文
获取外文期刊封面目录资料

摘要

The broadened use of Semantic Web technologies across domains has led to a shift in focus from simple pattern matching queries on RDF data to analytical queries with complex grouping and aggregations. An RDF analytical query involves graph pattern matching, which translates to several join operations due to the fine-grained nature of RDF data model. Complex analytical queries involve multiple grouping-aggregations on different graph patterns, making such tasks join-intensive. Scale-out processing of RDF analytical queries on existing relational-style MapReduce platforms such as Apache Hive and Pig, results in lengthy execution workflows with multiple cycles of I/O and network transfer. Additionally, certain graph patterns result in avoidable redundancy in intermediate results, which negatively impacts processing costs. The PhD thesis summarized in this paper proposes a two-pronged approach to minimize the costs while processing RDF queries on MapReduce: an algebraic approach based on a Nested TripleGroup Data Model and Algebra that reinterprets graph pattern queries in a way that reduces the required number of map-reduce cycles, and special strategies to minimize the redundancy in intermediate data while processing certain graph patterns. The proposed techniques are integrated into Apache Pig. Empirical evaluation of this work for processing graph pattern queries show 45–60% performance gains over systems such as Pig and Hive.
机译:扩大跨域的语义Web技术的使用导致焦点从RDF数据上的简单模式匹配查询的转变为复杂分组和聚合的分析查询。 RDF分析查询涉及图形模式匹配,它由于RDF数据模型的细粒度性质而转化为几个连接操作。复杂的分析查询涉及不同图形模式的多个分组聚合,使得此类任务加入密集型。在Apache Hive和Pig等现有关系样式MapReduce平台上的RDF分析查询的扩展处理,导致冗长的执行工作流,具有多个I / O和网络传输。另外,某些图形模式导致中间结果中的可避免冗余,这会对处理成本产生负面影响。本文摘要的博士学位提出了一种双管齐下的方法,以最大限度地减少成本,同时在MapReduce上处理RDF查询:基于嵌套的三族数据模型和代数的代数方法以减少所需数量的方式重新诠释图案模式查询地图 - 减少周期,以及在处理某些图形模式时最小化中间数据中的冗余最小化的特殊策略。所提出的技术被整合到Apache猪中。对处理图形模式查询的实证评估显示,猪和蜂巢等系统上显示出45-60%的性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号