...
【24h】

Engineering Motif Search for Large Motifs

机译:工程图案搜索大型图案

获取原文

摘要

Given a vertex-colored graph H and a multiset M of colors as input, the graph motif problem asks us to decide whether H has a connected induced subgraph whose multiset of colors agrees with M. The graph motif problem is NP-complete but known to admit randomized algorithms based on constrained multilinear sieving over GF(2^b) that run in time O(2^kk^2m {M({2^b})}) and with a false-negative probability of at most k/2^{b-1} for a connected m-edge input and a motif of size k. On modern CPU microarchitectures such algorithms have practical edge-linear scalability to inputs with billions of edges for small motif sizes, as demonstrated by Bj?rklund, Kaski, Kowalik, and Lauri [ALENEX'15]. This scalability to large graphs prompts the dual question whether it is possible to scale to large motif sizes. We present a vertex-localized variant of the constrained multilinear sieve that enables us to obtain, in time O(2^kk^2m{M({2^b})}) and for every vertex simultaneously, whether the vertex participates in at least one match with the motif, with a per-vertex probability of at most k/2^{b-1} for a false negative. Furthermore, the algorithm is easily vector-parallelizable for up to 2^k threads, and parallelizable for up to 2^kn threads, where n is the number of vertices in H. Here {M({2^b})} is the time complexity to multiply in GF(2^b).We demonstrate with an open-source implementation that our variant of constrained multilinear sieving can be engineered for vector-parallel microarchitectures to yield hardware utilization that is bound by the available memory bandwidth. Our main engineering contributions are (a) a version of the recurrence for tightly labeled arborescences that can be executed as a sequence of memory-and-arithmetic coalescent parallel workloads on multiple GPUs, and (b) a bit-sliced low-level implementation for arithmetic in characteristic 2 to support (a).
机译:给定顶点彩色的图表H和一个颜色的多重M型作为输入,图表主题问题要求我们决定H是否有一个连接的诱导的子图,其多种颜色的颜色与M同意。图形主题问题是NP-Create,但已知基于TIME O(2 ^ kK ^ 2m {m({2 ^ b})})的GF(2 ^ b)的受约束多线性筛分的随机算法,并且具有最多k / 2的假阴性概率^ {B-1}对于连接的M-EDGE输入和大小k的图案。在现代CPU微架构上,这种算法具有实际的边缘线性可扩展性,与小型主题尺寸的数十亿边缘输入,如BJ?Rklund,Kaski,Kowalik和Lauri [Alenex'15]所示。对于大图的这种可扩展性提示双重问题是否可以扩展到大型主题大小。我们介绍了一个受约束的多线性筛的顶点局部化变体,使我们能够在时间o(2 ^ kk ^ 2m {m({2 ^ b})})和同时为每个顶点而获得,无论顶点是否参与至少一个与图案匹配,具有最高k / 2 ^ {b-1}的每个顶点概率,用于假阴性。此外,该算法容易载有最多2 ^ k线程的矢量 - 并行,最多2 ^ kn线程,其中n是H中的顶点数。这里是{m({2 ^ b})}是在GF(2 ^ b)中乘以时间复杂性(2 ^ b).we用开源实现展示,我们可以为矢量平行微架构设计我们受约束的多线性筛分的变体,以产生由可用内存带宽束缚的硬件利用率。我们的主要工程贡献是(a)紧密标记的植物学的复发版本,可以作为多个GPU上的一系列内存和算术型平行工作负载执行,并且(B)有点切片的低电平实现特征2的算法以支持(a)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号