Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures

机译：数据分布对多线程架构不规则降低性能的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computations from many scientific and engineering domains use irregular meshes and/or sparse matrices. The codes expressing these computations involve irregular reductions. The main characteristics of irregular reduction loops are 1) elements of left-hand-side arrays may be incremented in multiple iterations of the loop, but only using associative and commutative operations (these arrays are called reduction arrays), 2) there are no loop carried dependencies, except on elements of reduction arrays, and 3) one or more arrays are accessed using indirection arrays. It is very challenging to efficiently parallelize codes involving irregular reductions, especially on large parallel machines. Because of accesses through indirection arrays, communication and locality are hard to manage. Not only is the total communication volume large, but the communication requirements typically cannot be determined at compile time. It is also hard to efficiently allocate space for non-local elements. Particularly, there are no effective solutions for parallelization of adaptive irregular reductions. In an adaptive irregular reduction, the elements of the indirection arrays are modified after every few iterations. This significantly increases the overhead associated with partitioning and runtime preprocessing routines [7,11], which have been critical for achieving locality, communication efficiency, and effective buffer management. Recently, there has been much interest in multithreaded architectures. A multiprocessor based upon a multithreaded architecture supports multiple threads of execution on each processor. These architectures also support low-cost thread initiation, low-overhead communication, and efficient communication and synchronization between threads on different processors. Multithreaded architectures are considered a promising medium for scalable parallelization of irregular applications, where the frequent communication and synchronization make parallelization hard on conventional parallel machines. We have developed an execution strategy for irregular reductions on a multithreaded architecture. The key idea in our execution model is that the frequency and volume of communication is independent of the contents of the indirection arrays. Thus, unlike other approaches to scalable parallelization of irregular reductions, our approach does not require mesh partitioning [1], array renumbering [6], or a high-cost inspector that itself requires communication between processors [7]. The performance depends upon the architecture's ability to support low-cost communication and overlap communication and computation, and is largely independent of the problem partitioning. Thus, the same performance can be obtained on adaptive problems, without paying the high overhead of partitioning frequently.

机译：来自许多科学和工程域的计算使用不规则网格和/或稀疏矩阵。表达这些计算的代码涉及不规则的降低。不规则缩小循环的主要特征是1）左侧阵列的元素可以在循环的多次迭代中递增，但仅使用关联和换向操作（这些阵列称为缩减阵列），2）没有循环除了减少阵列元素之外的携带依赖关系，以及3）使用间接阵列访问一个或多个阵列。有效地平行化涉及不规则减少的代码非常具有挑战性，特别是在大型平行机上。由于通过间接阵列进行访问，难以管理的通信和局部性。不仅是总通信量大，而且通常不能在编译时确定通信要求。对于非本地元素，也很难有效地分配空间。特别是，对于自适应不规则降低，没有有效的解决方案。在自适应不规则的降低中，每次迭代后都会修改间接阵列的元素。这显着提高了与分区和运行时预处理程序相关的开销[7,11]，这对于实现地区，通信效率和有效缓冲管理一直至关重要。最近，对多线程建筑有很多兴趣。基于多线程架构的多处理器支持每个处理器上的多个执行线程。这些架构还支持低成本的线程启动，低开销通信和不同处理器上线程之间的高效通信和同步。多线程架构被认为是用于不规则应用的可扩展并行化的有希望的介质，其中频繁的通信和同步在传统的并行机器上难以努力。我们开发了对多线程架构的不规则减少的执行策略。我们执行模型中的关键思想是通信的频率和体积与间接阵列的内容无关。因此，与其他方法不同于不规则降低的可扩展并行化，我们的方法不需要网格分区[1]，数组重新编号[6]，或者是本身需要在处理器之间进行通信的高成本检查员[7]。性能取决于架构支持低成本通信和重叠通信和计算的能力，并且在很大程度上独立于问题分区。因此，可以在适应性问题上获得相同的性能，而不会频繁地支付划分的高开销。

著录项

来源
《International Conference on High-Performance Computing and Networking》|2001年||共10页
会议地点
作者
Gary Zoppetti; Gagan Agrawal; Rishi Kumar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Designing Next-Generation Massively Multithreaded Architectures for Irregular Applications [J] . Tumeo Antonino, Secchi Simone, Villa Oreste Computer . 2012,第8期

机译：设计用于不规则应用程序的下一代大规模多线程体系结构
2. Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture [J] . Basel A. Mahafzah Journal of supercomputing . 2013,第1期

机译：同时多线程体系结构上的多线程快速排序算法的性能评估
3. Data transformations enabling loop vectorization on multithreaded data parallel architectures [J] . Jang Byunghyun, Mistry Perhaad, Schaa Dana, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2010,第5期

机译：数据转换可在多线程数据并行体系结构上实现循环矢量化
4. Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures [C] . Gary Zoppetti, Gagan Agrawal, Rishi Kumar International Conference on High-Performance Computing and Networking . 2001

机译：数据分布对多线程架构不规则降低性能的影响
5. High Performance Soft Processor Architectures for Applications with Irregular Data- and Instruction-Level Parallelism [D] . Aasaraai, Kaveh 2014

机译：具有不规则数据和指令级并行性的应用的高性能软处理器架构
6. A database of human gait performance on irregular and uneven surfaces collected by wearable sensors [O] . Yue Luo, Sarah M. Coppola, Philippe C. Dixon, 2020

机译：穿戴传感器收集的不规则和不均匀表面上的人体步态性能数据库
7. Data and Workload Distribution in a Multithreaded Architecture [O] . Andrew Sohn, Mitsuhisa Sato, Namhoon Yoo, 1996

机译：多线程体系结构中的数据和工作负载分布

Impact of Data Distribution on Performance of Irregular Reductions on Multithreaded Architectures

摘要

著录项

相似文献

相关主题

期刊订阅