首页> 美国卫生研究院文献>Frontiers in Computational Neuroscience >Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer
【2h】

Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer

机译:Blue Gene / P超级计算机上神经元尖峰交换方法的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8–128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·1010 connections (K is 1024, M is 10242, and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods—the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores—had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.
机译:对于并行机上的神经网络仿真,处理器间的尖峰通信可能占总仿真时间的很大一部分。使用Blue Gene / P(BG / P)超级计算机的几种尖峰交换方法的性能已在8–128 K个核心下进行了测试,使用了随机连接的网络,该网络最多可连接32 M个单元,每个单元1 k个连接,4 M个单元,每个10每个单元k个连接,即4个10 10 连接的数量(K为1024,M为1024 2 ,k为1000)。所使用的尖峰交换方法是标准消息传递接口(MPI)集合,MPI_Allgather,以及通过非阻塞MPI_Isend实现的非阻塞多发送方法的多种变体,或者利用了开销非常低的直接内存访问(DMA)的可能性BG / P上可用的通信。在所有情况下,由于启动尖峰通信的开销很大,因此性能最差的方法是使用MPI_Isend。两种性能最好的方法-使用深度计算消息框架DCMF_Multicast的Record-Replay功能的持久Multisend方法;以及两阶段多发送,其中使用DCMF_Multicast首先发送到第一阶段目标核心的子集,然后将其传递到第二阶段目标核心的子集-具有相似的性能,并且启动尖峰的开销非常低通讯。 Multisend方法的理想缩放比例几乎完全偏离了,这是由于在同步之间的间隔内,在每个处理器上触发的单元数量发生巨大变化而导致的负载不平衡。峰值交换时间本身可以忽略不计,因为传输与计算重叠并且由DMA控制器处理。我们得出结论,理想的性能缩放最终将受到同步间隔之间传入处理器峰值之间的不平衡的限制。因此,与直觉相反,负载平衡的最大化要求处理器上的单元分布不反映神经网络架构,而是随机分布,以便一起触发的单元集应位于不同的处理器上,其目标应位于一组处理器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号