Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays

Tanase Alexandru; Witterauf Michael; Teich Juergen; Hannig Frank

首页> 外文期刊>ACM Transactions on Embedded Computing Systems >Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays

【24h】

Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays

机译：大型并行处理器阵列的循环程序符号多级环映射

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today's MPSoCs (multiprocessor systems-on-chip) have brought up massively parallel processor array accelerators that may achieve a high computational efficiency by exploiting multiple levels of parallelism and different memory hierarchies. Such parallel processor arrays are perfect targets, particularly for the acceleration of nested loop programs due to their regular and massively parallel nature. However, existing loop parallelization techniques are often unable to exploit multiple levels of parallelism and are either I/O or memory bounded. Furthermore, if the number of available processing elements becomes only known at runtime-as in adaptive systems-static approaches fail. In this article, we solve some of these problems by proposing a hybrid compile/runtime multi-level symbolic parallelization technique that is able to: (a) exploit multiple levels of parallelism as well as (b) different memory hierarchies, and (c) to match the I/O or memory capabilities of the target architecture for scenarios where the number of available processing elements is only known at runtime. Our proposed technique consists of two compile-time transformations: (a) symbolic hierarchical tiling followed by (b) symbolic multi-level scheduling. The tiling levels scheduled in parallel exploit different levels of parallelism, whereas the sequential one, different memory hierarchies. Furthermore, by tuning the size of the tiles on the individual levels, a tradeoff between the necessary I/O-bandwidth and memory is possible, which facilitates obeying resource constraints. The resulting schedules are symbolic with respect to the problem size and tile sizes. Thus, the number of processing elements to map onto does not need to be known at compile time. At runtime, when the number of available processors becomes known, a simple prologue chooses a feasible schedule with respect to I/O and memory constraints that is latency-optimal for the chosen tile size. In summary, our approach determines the set of feasible, latency-optimal symbolic loop schedule candidates at compile time, from which one is dynamically selected at runtime. This approach exploits multiple levels of parallelism, is independent of the problem size of the loop nest, and thereby avoids any expensive re-compilation at runtime. This is particularly important for low cost and memory-scarce embedded MPSoC platforms that may not afford to host a just-in-time compiler.

机译：今天的MPSoC（多处理器系统上的片上）已经通过利用多个水平的并行度和不同的存储层次结构来提高巨大的并行处理器阵列加速器，该加速器可以实现高计算效率。这种并行处理器阵列是完美的目标，特别是由于它们的常规和大规模平行的性质而加速嵌套环路程序。然而，现有的循环并行化技术通常无法利用多个水平的并行性，并且是I / O或内存。此外，如果可用处理元素的数量仅在运行时仅在运行时已知 - 如自适应系统 - 静态方法失败。在本文中，我们通过提出能够：（a）利用多个水平的并行性以及（b）不同的内存层次结构，以及（b）不同的内存层次结构，解决一些这些问题匹配目标架构的I / O或内存功能，以实现可用处理元素的数量仅在运行时已知的场景。我们所提出的技术由两个编译时间转换组成：（a）符号分层折叠，其次是（b）符号多级调度。平铺级别安排在并行利用不同级别的并行性，而顺序级别，不同的内存层次结构。此外，通过调整各个级别上的瓦片的大小，可以在必要的I / O - 带宽和存储器之间进行折衷，这有利于遵守资源约束。结果的时间表是关于问题大小和瓦片尺寸的符号。因此，在编译时，要映射到的处理元件的数量不需要知道。在运行时，当已知可用处理器的数量时，一个简单的序言选择关于I / O和存储器约束的可行计划，该时间表是所选择的瓦片大小的延迟最佳。总之，我们的方法在编译时确定了可行性延迟最佳最佳符号循环调度候选的集合，从中在运行时动态选择。该方法利用多个级别的并行性，与循环嵌套的问题大小无关，从而避免在运行时避免任何昂贵的重新编译。这对于低成本和内存稀缺的嵌入式MPSOC平台尤其重要，可能无法承受驻留时间编译器。

著录项

来源
《ACM Transactions on Embedded Computing Systems》 |2018年第2期|共27页
作者
Tanase Alexandru; Witterauf Michael; Teich Juergen; Hannig Frank;
展开▼
作者单位

Friedrich Alexander Univ Erlangen Nurnberg FAU Dept Comp Sci Hardware Software Codesign Cauerstr 11 D-91058 Erlangen Germany;

Friedrich Alexander Univ Erlangen Nurnberg FAU Dept Comp Sci Hardware Software Codesign Cauerstr 11 D-91058 Erlangen Germany;

Friedrich Alexander Univ Erlangen Nurnberg FAU Dept Comp Sci Hardware Software Codesign Cauerstr 11 D-91058 Erlangen Germany;

Friedrich Alexander Univ Erlangen Nurnberg FAU Dept Comp Sci Hardware Software Codesign Cauerstr 11 D-91058 Erlangen Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
Processor arrays; symbolic parallelization; mapping; loop programs;

机译：处理器阵列;符号并行化;映射;循环程序;

相似文献

外文文献
中文文献
专利

1. Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays [J] . Tanase Alexandru, Witterauf Michael, Teich Juergen, ACM Transactions on Embedded Computing Systems . 2018,第2期

机译：大型并行处理器阵列的循环程序符号多级环映射
2. Efficient control generation for mapping nested loop programs onto processor arrays [J] . Hritam Dutta, Frank Hannig, Holger Ruckdeschel, Journal of systems architecture . 2007,第5a6期

机译：高效的控件生成，用于将嵌套循环程序映射到处理器阵列
3. Loop-level parallelism in numeric and symbolic programs [J] . Larus J.R. IEEE Transactions on Parallel and Distributed Systems . 1993,第7期

机译：数字和符号程序中的循环级并行性
4. Symbolic parallelization of loop programs for massively parallel processor arrays [C] . Teich Jurgen, Tanase Alexandru, Hannig Frank IEEE International Conference on Application-specific Systems, Architectures and Processors . 2013

机译：大规模并行处理器阵列的循环程序的符号并行化
5. Parallelization of programs containing loop-carried dependences with resource constraints. [D] . Wang, Haigeng. 1994

机译：程序并行化，其中包含循环携带的依赖关系和资源约束。
6. Quantitative analysis of RNA-protein interactions on a massively parallel array for mapping biophysical and evolutionary landscapes [O] . Jason D. Buenrostro, Lauren M. Chircus, Carlos L. Araya, -1

机译：大规模平行阵列上RNA-蛋白质相互作用的定量分析用于绘制生物物理和进化景观
7. Automatic parallelization of nested loop programs (for non-manifest realtime stream processing applications [O] . Tjerk Bijlsma 2015

机译：嵌套循环程序的自动并行化（用于非清单实时流处理应用程序）

Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays

摘要

著录项

相似文献

相关主题

期刊订阅