首页> 外文期刊>Journal of Parallel and Distributed Computing >Parallelizing maximal clique and k-plex enumeration over graph data
【24h】

Parallelizing maximal clique and k-plex enumeration over graph data

机译:图数据的并行最大集团和k重枚举

获取原文
获取原文并翻译 | 示例

摘要

In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs cliques and k-plex is an essential component. Unfortunately, these problems are NP-Complete and thus computationally intensive at scale — hence there is a need to come up with techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough. In this paper, we first propose a new approach for maximal clique and k-plex enumeration, which identifies dense subgraphs by binary graph partitioning. Given a connected graph G = (V, E), it has a space complexity of O(|E|) and a time complexity of O(|E|μ(G)), where μ(G) represents the number of different cliques (k-plexes) existing in G. It recursively divides a graph until each task is sufficiently small to be processed in parallel. We then develop parallel solutions and demonstrate how graph partitioning can enable effective load balancing. Finally, we evaluate the performance of the proposed approach on real and synthetic graph data and show that it performs considerably better than existing approaches in both centralized and parallel settings. In the parallel setting, it can achieve the speedups of up to lOx over existing approaches on large graphs. Our parallel algorithms are primarily implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks. The work presented in this paper is an extension of our preliminary work on the approach of binary graph partitioning for maximal clique enumeration. In this work, we extend the proposed approach to handle maximal k-plex detection as well.
机译:在各种新兴的数据密集型应用程序中,例如社交网络分析,Web文档聚类,实体解析以及系统生物学中始终表达的共同表达基因的检测中,密集子图集团和k-plex的检测是必不可少的组件。不幸的是,这些问题是NP-Complete的,因此在规模上计算量很大-因此,需要提出一种将计算分布在多台计算机上的技术,这样一来,在一台计算机上的计算就很费时了。只要足够大,就可以在机器集群上高效地执行。在本文中,我们首先提出了一种用于最大集团和k-plex枚举的新方法,该方法通过二元图划分来识别密集子图。给定一个连通图G =(V,E),它的空间复杂度为O(| E |),时间复杂度为O(| E |μ(G)),其中μ(G)表示相异数它递归地划分图,直到每个任务足够小以至于可以并行处理。然后,我们开发并行解决方案,并演示图分区如何实现有效的负载平衡。最后,我们评估了该方法在真实和合成图形数据上的性能,并表明,在集中式和并行设置下,该方法的性能均比现有方法好得多。在并行设置中,与大图上的现有方法相比,它可以实现高达10倍的加速。我们的并行算法主要在MapReduce上实现和评估,MapReduce是一种流行的无共享并行框架,但是可以轻松地推广到其他无共享或共享内存并行框架。本文介绍的工作是对我们的最大工作集团枚举二进制图分区方法的初步工作的扩展。在这项工作中,我们将提出的方法扩展为也可以处理最大k重检测。

著录项

  • 来源
  • 作者单位

    School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China;

    School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China;

    School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China;

    School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China;

    School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China;

    School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, 710072, PR China;

    Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104-6389, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Maximal clique enumeration; Maximal k-plex enumeration; Parallel graph processing; MapReduce;

    机译:最大集团枚举;最大k-plex枚举;并行图处理;MapReduce;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号