首页> 外文期刊>Journal of Parallel and Distributed Computing >Granite: A distributed engine for scalable path queries over temporal property graphs
【24h】

Granite: A distributed engine for scalable path queries over temporal property graphs

机译:花岗岩:用于可扩展路径的分布式引擎在时间属性图上查询

获取原文
获取原文并翻译 | 示例

摘要

Property graphs are a common form of linked data, with path queries used to traverse and explore them for enterprise transactions and mining. Temporal property graphs are a recent variant where time is a first-class entity to be queried over, and their properties and structure vary over time. These are seen in social, telecom, transit and epidemic networks. However, current graph databases and query engines have limited support for temporal relations among graph entities, no support for time-varying entities and/or do not scale on distributed resources. We address this gap by extending a linear path query model over property graphs to include intuitive temporal predicates and aggregation operators over temporal graphs. We design a distributed execution model for these temporal path queries using the interval-centric computing model, and develop a novel cost model to select an efficient execution plan from several. We perform detailed experiments of our granite distributed query engine using both static and dynamic temporal property graphs as large as 52M vertices, 218M edges and 325M properties, and a 1600-query workload, derived from the LDBC benchmark. We frequently offer sub-second query latencies on a commodity cluster, which is 149×-1140× faster compared to industry-leading Neo4J shared-memory graph database and the JanusGraph/Spark distributed graph query engine. Granite also completes 100% of the queries for all graphs, compared to only 32-92% workload completion by the baseline systems. Further, our cost model selects a query plan that is within 10% of the optimal execution time in 90% of the cases. Despite the irregular nature of graph processing, we exhibit a weak-scaling efficiency of ≥ 60% on 8 nodes and ≥ 40% on 16 nodes, for most query workloads.
机译:属性图是一种常见的链接数据形式,具有用于遍历和探索其企业事务和挖掘的路径查询。时间属性图是最近的变体,其中时间是要查询的一流实体,它们的属性和结构随时间而变化。这些都是在社会,电信,过境和流行网络中看到的。但是,当前图形数据库和查询引擎对图形实体之间的时间关系有限,不支持时变实体和/或不扩展分布式资源。通过将线性路径查询模型扩展到属性图来包括在时间图中包含直观的时间谓词和聚合运算符来解决这个差距。我们使用间隔的计算模型设计这些时间路径查询的分布式执行模型,并开发一种新的成本模型来选择来自几个的有效执行计划。我们使用静态和动态时间属性图表执行了花岗岩分布式查询引擎的详细实验,使用52米顶点,218m边和325米属性,以及从LDBC基准测试的1600查询工作负载。我们经常在商品集群上提供分二次查询延迟,与业界领先的Neo4J共享记忆图数据库和JanusGraph / Spark分布图查询引擎相比,这是149×-1140倍的速度。花岗岩还完成了所有图形的100%查询,而基线系统仅相比仅32-92%的工作负载完成。此外,我们的成本模型选择了一个查询计划,该计划在90%的情况下最佳执行时间的10%。尽管图形处理的不规则性质,但对于大多数查询工作负载,我们展示了8个节点上的弱缩放效率≥60%,≥40%,对于大多数查询工作负载。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号