首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Efficient Parallel Processing of Distance Join Queries Over Distributed Graphs
【24h】

Efficient Parallel Processing of Distance Join Queries Over Distributed Graphs

机译:分布式图上距离联接查询的高效并行处理

获取原文
获取原文并翻译 | 示例

摘要

Distance join queries have recently been recognized as a particularly useful operation over graph data, since they capture graph similarity in a meaningful way. Consequently, they have been studied extensively in recent years , . However, current methods are designed for centralized systems, and rely on the graph embedding for effective pruning and indexing. As graph sizes become very large and graph data must be deployed in the distributed environment, these techniques become impractical. In this work, we propose a solution for efficient parallel processing of distance join queries over distributed large graphs. There have been emerging efforts devoted to managing large graphs in distributed and parallel systems. Programming models like Pregel and iterativecomputing framework like HaLoop have been proposed to handle queries over distributed graphs. However, they are designed in the perspective of functionality instead of the query efficiency. In this work, we define an optimization problem: combining the iterative join and the graph exploration method to minimize the evaluation time of distance join queries. Without sacrificing a system’s scalability, our technique exploits a light-weight vertex centric encoding schema built on a distance-aware partition of the entire graph. Extensive experiments over both real and synthetic large graphs show that, by employing an adaptive query plan generation and scheduling method, we can effectively reduce the redundant message passing and I/O costs. Compared to simply using iterative join or graph exploration method, our solution achieves as many as one order of magnitude of time saving for the query evaluation.
机译:距离联接查询最近被认为是对图形数据特别有用的操作,因为它们以有意义的方式捕获了图形相似性。因此,近年来对其进行了广泛的研究。但是,当前的方法是为集中式系统设计的,并且依赖于图形嵌入来进行有效的修剪和索引。随着图的大小变得非常大,并且必须在分布式环境中部署图数据,这些技术变得不切实际。在这项工作中,我们提出了一种解决方案,可对分布式大图进行距离联接查询的有效并行处理。已经出现了致力于在分布式和并行系统中管理大型图形的新兴工作。已经提出了像Pregel这样的编程模型和像HaLoop这样的迭代计算框架来处理分布式图形的查询。但是,它们是从功能而不是查询效率的角度设计的。在这项工作中,我们定义了一个优化问题:将迭代联接和图探索方法相结合,以最小化距离联接查询的评估时间。在不牺牲系统可扩展性的前提下,我们的技术利用了轻量级的以顶点为中心的编码架构,该架构基于整个图的距离感知分区。对实大图和合成大图的大量实验表明,通过采用自适应查询计划生成和调度方法,我们可以有效地减少冗余消息传递和I / O成本。与仅使用迭代联接或图探索方法相比,我们的解决方案可节省多达一个数量级的时间以用于查询评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号