首页> 外文会议>International conference on very large data bases >NScale: Neighborhood-centric Analytics on Large Graphs
【24h】

NScale: Neighborhood-centric Analytics on Large Graphs

机译:NScale:大图上以邻域为中心的分析

获取原文

摘要

There is an increasing interest in executing rich and complex analysis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting in biological networks, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing influence cascades, and so on. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high communication, scheduling, and memory overheads in executing such tasks. Further, most existing graph processing frameworks also typically ignore the challenges in extracting the relevant portions of the graph that an analysis task is interested in, and loading it onto distributed memory. In this demonstration proposal, we describe NS_(CALE), a novel end-to-end graph processing framework that enables the distributed execution of complex neighborhood-centric analytics over large-scale graphs in the cloud. NS_(CALE) enables users to write programs at the level of neighborhoods or subgraphs. NS_(CALE) uses Apache YARN for efficient and fault-tolerant distribution of data and computation; it features GEL, a novel graph extraction and loading phase, that extracts the relevant portions of the graph and loads them into distributed memory using as few machines as possible. NS_(CALE) utilizes novel techniques for the distributed execution of user computation that minimize memory consumption by exploiting overlap among the neighborhoods of interest. A comprehensive experimental evaluation shows orders-of-magnitude improvements in performance and total cost over vertex-centric approaches.
机译:人们对在大型图上执行丰富而复杂的分析任务越来越感兴趣,其中许多任务需要处理和推理图中的大量多跳邻域或子图。此类任务的示例包括自我网络分析,生物网络中的主题计数,寻找社交圈,个性化推荐,链接预测,异常检测,分析影响级联等等。现有的以顶点为中心的图形处理框架无法很好地完成这些任务,其计算和执行模型限制了用户程序直接访问单个顶点的状态,从而导致执行此类任务时的通信,调度和内存开销较高。此外,大多数现有的图处理框架通常还忽略了以下挑战:提取分析任务感兴趣的图的相关部分并将其加载到分布式内存中。在此演示建议中,我们描述了NS_(CALE),这是一种新颖的端到端图处理框架,该框架能够在云中的大型图上分布式执行复杂的以邻域为中心的分析。 NS_(CALE)使用户可以在邻域或子图级别上编写程序。 NS_(CALE)使用Apache YARN进行数据和计算的高效且容错的分配;它具有GEL(一种新颖的图形提取和加载阶段)功能,可以提取图形的相关部分,并使用尽可能少的机器将它们加载到分布式内存中。 NS_(CALE)利用新颖的技术进行用户计算的分布式执行,该技术通过利用感兴趣的邻域之间的重叠来最大程度地减少内存消耗。全面的实验评估显示,与以顶点为中心的方法相比,性能和总成本得到了数量级的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号