首页> 外文会议>International Conference on High-Performance Computing and Networking >Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures
【24h】

Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures

机译:数据分布对多线程架构不规则降低性能的影响

获取原文

摘要

Computations from many scientific and engineering domains use irregular meshes and/or sparse matrices. The codes expressing these computations involve irregular reductions. The main characteristics of irregular reduction loops are 1) elements of left-hand-side arrays may be incremented in multiple iterations of the loop, but only using associative and commutative operations (these arrays are called reduction arrays), 2) there are no loop carried dependencies, except on elements of reduction arrays, and 3) one or more arrays are accessed using indirection arrays. It is very challenging to efficiently parallelize codes involving irregular reductions, especially on large parallel machines. Because of accesses through indirection arrays, communication and locality are hard to manage. Not only is the total communication volume large, but the communication requirements typically cannot be determined at compile time. It is also hard to efficiently allocate space for non-local elements. Particularly, there are no effective solutions for parallelization of adaptive irregular reductions. In an adaptive irregular reduction, the elements of the indirection arrays are modified after every few iterations. This significantly increases the overhead associated with partitioning and runtime preprocessing routines [7,11], which have been critical for achieving locality, communication efficiency, and effective buffer management. Recently, there has been much interest in multithreaded architectures. A multiprocessor based upon a multithreaded architecture supports multiple threads of execution on each processor. These architectures also support low-cost thread initiation, low-overhead communication, and efficient communication and synchronization between threads on different processors. Multithreaded architectures are considered a promising medium for scalable parallelization of irregular applications, where the frequent communication and synchronization make parallelization hard on conventional parallel machines. We have developed an execution strategy for irregular reductions on a multithreaded architecture. The key idea in our execution model is that the frequency and volume of communication is independent of the contents of the indirection arrays. Thus, unlike other approaches to scalable parallelization of irregular reductions, our approach does not require mesh partitioning [1], array renumbering [6], or a high-cost inspector that itself requires communication between processors [7]. The performance depends upon the architecture's ability to support low-cost communication and overlap communication and computation, and is largely independent of the problem partitioning. Thus, the same performance can be obtained on adaptive problems, without paying the high overhead of partitioning frequently.
机译:来自许多科学和工程域的计算使用不规则网格和/或稀疏矩阵。表达这些计算的代码涉及不规则的降低。不规则缩小循环的主要特征是1)左侧阵列的元素可以在循环的多次迭代中递增,但仅使用关联和换向操作(这些阵列称为缩减阵列),2)没有循环除了减少阵列元素之外的携带依赖关系,以及3)使用间接阵列访问一个或多个阵列。有效地平行化涉及不规则减少的代码非常具有挑战性,特别是在大型平行机上。由于通过间接阵列进行访问,难以管理的通信和局部性。不仅是总通信量大,而且通常不能在编译时确定通信要求。对于非本地元素,也很难有效地分配空间。特别是,对于自适应不规则降低,没有有效的解决方案。在自适应不规则的降低中,每次迭代后都会修改间接阵列的元素。这显着提高了与分区和运行时预处理程序相关的开销[7,11],这对于实现地区,通信效率和有效缓冲管理一直至关重要。最近,对多线程建筑有很多兴趣。基于多线程架构的多处理器支持每个处理器上的多个执行线程。这些架构还支持低成本的线程启动,低开销通信和不同处理器上线程之间的高效通信和同步。多线程架构被认为是用于不规则应用的可扩展并行化的有希望的介质,其中频繁的通信和同步在传统的并行机器上难以努力。我们开发了对多线程架构的不规则减少的执行策略。我们执行模型中的关键思想是通信的频率和体积与间接阵列的内容无关。因此,与其他方法不同于不规则降低的可扩展并行化,我们的方法不需要网格分区[1],数组重新编号[6],或者是本身需要在处理器之间进行通信的高成本检查员[7]。性能取决于架构支持低成本通信和重叠通信和计算的能力,并且在很大程度上独立于问题分区。因此,可以在适应性问题上获得相同的性能,而不会频繁地支付划分的高开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号