首页> 外文会议>International Parallel and Distributed Processing Symposium >Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture
【24h】

Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

机译:编译器和运行时支持多线程架构的不规则

获取原文

摘要

Computations from many scientific and engineering domains use irregular meshes and/or sparse matrices. The codes expressing these computations involve irregular reductions. Irregular reductions pose many challenges to parallel architectures and their compilers in terms of parallelization, locality management, and communication optimization. Multithreaded architectures offer rich support for local synchronization, overlapping of communication and computation, and low-overhead communication and thread switching. Therefore, they appear to be promising for scalable parallelization of irregular reductions. This paper presents an execution model and a compilation strategy for supporting irregular reductions on a fine-grained multithreaded architecture. The key aspect of this strategy is that the frequency and volume of communication is independent of the contents of the indirection arrays. The performance obtained depends upon the architecture's ability to overlap communication and computation and is largely independent of the partitioning of the problem. We present experimental results from compiling three scientific kernels involving irregular reductions (mvm, euler, and moldyn) for execution on the EARTH fine-grained multithreaded architecture. On mvm, which does not involve any left-hand-side irregular accesses, we achieve near linear absolute speedups. For euler and moldyn, which do involve left-hand-side irregular accesses, our strategy initially incurs some overheads, but the relative speedups are very good. In going from 2 to 32 processors, the relative speedups for euler were 9.28 and 10.36 on its two datasets, while the speedups for moldyn were 9.70 and 10.76 on its two datasets.
机译:来自许多科学和工程域的计算使用不规则网格和/或稀疏矩阵。表达这些计算的代码涉及不规则的降低。在并行化,地区管理和通信优化方面,不规则的减少对并行架构及其编制者构成了许多挑战。多线程架构提供丰富的本地同步,通信和计算重叠,以及低开销通信和线程切换。因此,它们似乎有希望用于不规则降低的可扩展并行化。本文介绍了执行模型和编译策略,用于支持细粒度多线程架构的不规则缩减。该策略的关键方面是通信的频率和体积与间接阵列的内容无关。所获得的性能取决于架构重叠通信​​和计算的能力,并且在很大程度上与问题的分区无关。我们提出了编译三种科学核的实验结果,涉及不规则的减少(MVM,Euler和Moldyn),用于在地球细粒度的多线程架构上执行。在MVM上,不涉及任何左手侧的不规则访问,我们在线性绝对加速附近实现。对于欧拉和摩尔人,这涉及左手侧的不规则访问,我们的策略最初会引发一些开销,但相对加速非常好。从2到32个处理器进行,欧拉的相对加速度在其两个数据集上为9.28和10.36,而Moldyn的Speedups在其两个数据集上为9.70和10.76。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号