首页> 外文会议>2014 IEEE 30th International Conference on Data Engineering Workshops >Towards optimization of RDF analytical queries on MapReduce
【24h】

Towards optimization of RDF analytical queries on MapReduce

机译:在MapReduce上优化RDF分析查询

获取原文
获取原文并翻译 | 示例

摘要

The broadened use of Semantic Web technologies across domains has led to a shift in focus from simple pattern matching queries on RDF data to analytical queries with complex grouping and aggregations. An RDF analytical query involves graph pattern matching, which translates to several join operations due to the fine-grained nature of RDF data model. Complex analytical queries involve multiple grouping-aggregations on different graph patterns, making such tasks join-intensive. Scale-out processing of RDF analytical queries on existing relational-style MapReduce platforms such as Apache Hive and Pig, results in lengthy execution workflows with multiple cycles of I/O and network transfer. Additionally, certain graph patterns result in avoidable redundancy in intermediate results, which negatively impacts processing costs. The PhD thesis summarized in this paper proposes a two-pronged approach to minimize the costs while processing RDF queries on MapReduce: an algebraic approach based on a Nested TripleGroup Data Model and Algebra that reinterprets graph pattern queries in a way that reduces the required number of map-reduce cycles, and special strategies to minimize the redundancy in intermediate data while processing certain graph patterns. The proposed techniques are integrated into Apache Pig. Empirical evaluation of this work for processing graph pattern queries show 45–60% performance gains over systems such as Pig and Hive.
机译:语义Web技术在各个域中的广泛使用已导致重点从对RDF数据的简单模式匹配查询到具有复杂分组和聚合的分析查询。 RDF分析查询涉及图形模式匹配,由于RDF数据模型的细粒度性质,它转换为多个联接操作。复杂的分析查询涉及对不同图形模式的多个分组聚合,从而使此类任务变得密集。在现有的关系样式MapReduce平台(例如Apache Hive和Pig)上进行RDF分析查询的横向扩展处理会导致冗长的执行工作流,并具有多个I / O和网络传输周期。此外,某些图形模式会导致中间结果中可避免的冗余,这会对处理成本产生负面影响。本文概述的博士学位论文提出了一种两管齐下的方法来最大程度地降低在MapReduce上处理RDF查询时的成本:一种基于嵌套TripleGroup数据模型和代数的代数方法,该方法以减少所需数量的方式重新解释图形模式查询。映射减少循环,以及在处理某些图形模式时最大程度减少中间数据冗余的特殊策略。提出的技术已集成到Apache Pig中。对用于处理图形模式查询的这项工作的经验评估表明,与Pig和Hive这样的系统相比,性能提高了45-60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号