首页> 外文会议>International Conference series on Parallel Computing >Streams as an alternative to halo exchange
【24h】

Streams as an alternative to halo exchange

机译:溪流作为光环交换的替代品

获取原文

摘要

Many scientific problems can be solved computationally by using linear algebra to calculate numerical solutions. The data dependencies in linear algebra operations are often expressed in a stencil structure. The ubiquitous parallelisation technique for stencil codes, to enable the solution of large problems on parallel architectures, is domain decomposition. One of the limitations of domain decomposition, affecting the scalability and overall performance of any stencil code, is the necessity to transmit information from neighbouring stencil grid points between the domains, and in particular between processes, known as halo exchange. The necessity of halo exchange limits the size of problems that can be solved numerically as the performance often scales poorly, in particular in poorly load balanced systems. Extensive work has been done to improve the performance of such codes, such as multi-depth halos and dynamic load balancing by changing the domain size, however, these methods all use the concept of halos for the basic communication pattern. In this paper we propose an alternative to the traditional halo exchange method of transmitting data between processes. In our model inter-process communication becomes a uni-directional stream, and not an exchange, in that it is transmitted in one direction only. This removes the pair-wise synchronisation between neighbouring processes inherent in the halo exchange pattern. As all communication is unidirectional, data can be updated in place, rather than double-buffered, resulting in half the memory usage of the traditional approach. In this paper we also present our preliminary findings which show that for a one-dimensional implementation of this model using a Jacobi-stencil code, that the streaming method is consistently faster than the exchange pattern. At 16,000 processes, for both strong and weak scaling, streaming is over ten times faster than halo exchange. We are currently investigating further optimisation strategies and extending to multi-dimensional domain decomposition.
机译:通过使用线性代数来计算数值解决方案可以计算许多科学问题。线性代数操作中的数据依赖性通常以模板结构表示。用于模板代码的普遍存行并行技术,以实现并行架构上大问题的解决方案,是域分解。域分解的局限性之一,影响任何模版代码的可扩展性和整体性能,是必须在域之间的来自邻接的模板网格点发送信息,并且特别是在作为HALO交换的过程之间传输信息。光环交换的必要性限制了可以在数值上解决的问题的大小,因为性能通常较差,特别是负载平衡系统不良。已经完成了广泛的工作来提高这些代码的性能,例如通过改变域大小来改变多深度光环和动态负载平衡,但是,这些方法都使用HALO的概念来实现基本的通信模式。在本文中,我们提出了传统的光环交换方法在过程之间传输数据的替代方案。在我们的模型中,流程间通信成为一个单向流,而不是交换,因为它仅在一个方向上传输。这消除了Halo Exchange模式中固有的相邻进程之间的成对同步。由于所有通信都是单向的,可以更新数据,而不是双缓冲,导致传统方法的内存使用的一半。在本文中,我们也提出我们的初步调查结果,其显示,一维实现使用雅克比的模板代码,该代码流的方法是持续快于交换模式这种模式的。在16,000个过程中,对于强大而弱的缩放,流媒体比光环交换快10倍。我们目前正在调查进一步的优化策略,并延伸到多维域分解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号