首页> 外文会议>IEEE international conference on data engineering >Pagrol: Parallel graph olap over large-scale attributed graphs
【24h】

Pagrol: Parallel graph olap over large-scale attributed graphs

机译:Pagrol:大规模属性图上的并行图Olap

获取原文

摘要

Attributed graphs are becoming important tools for modeling information networks, such as the Web and various social networks (e.g. Facebook, LinkedIn, Twitter). However, it is computationally challenging to manage and analyze attributed graphs to support effective decision making. In this paper, we propose, Pagrol, a parallel graph OLAP (Online Analytical Processing) system over attributed graphs. In particular, Pagrol introduces a new conceptual Hyper Graph Cube model (which is an attributed-graph analogue of the data cube model for relational DBMS) to aggregate attributed graphs at different granularities and levels. The proposed model supports different queries as well as a new set of graph OLAP Roll-Up/Drill-Down operations. Furthermore, on the basis of Hyper Graph Cube, Pagrol provides an efficient MapReduce-based parallel graph cubing algorithm, MRGraph-Cubing, to compute the graph cube for an attributed graph. Pagrol employs numerous optimization techniques: (a) a self-contained join strategy to minimize I/O cost; (b) a scheme that groups cuboids into batches so as to minimize redundant computations; (c) a cost-based scheme to allocate the batches into bags (each with a small number of batches); and (d) an efficient scheme to process a bag using a single MapReduce job. Results of extensive experimental studies using both real Facebook and synthetic datasets on a 128-node cluster show that Pagrol is effective, efficient and scalable.
机译:属性图正成为建模信息网络(例如Web和各种社交网络,例如Facebook,LinkedIn,Twitter)的重要工具。但是,管理和分析属性图以支持有效的决策在计算上具有挑战性。在本文中,我们提出了Pagrol,一种基于属性图的并行图OLAP(在线分析处理)系统。特别是,Pagrol引入了新的概念超图多维数据集模型(这是关系DBMS的数据多维数据集模型的属性图类似物),以汇总不同粒度和级别的属性图。提出的模型支持不同的查询以及一组新的图形OLAP上滚/下钻操作。此外,Pagrol在Hyper Graph Cube的基础上,提供了一种有效的基于MapReduce的并行图求立方算法MRGraph-Cubing,以计算属性图的图立方。 Pagrol采用了许多优化技术:(a)一种独立的连接策略,以最大程度地降低I / O成本; (b)将长方体成批分组以最小化冗余计算的方案; (c)基于成本的计划,将批次分配到袋子中(每个批次数量很少); (d)使用单个MapReduce作业处理袋子的有效方案。在128个节点的集群上使用真实Facebook和合成数据集进行的广泛实验研究结果表明,Pagrol是有效,高效和可扩展的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号