CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

Balasubramanian Mahesh; Shrivastava Aviral

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

【24h】

CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

机译：Crimson：由随机迭代模数调度和CGRAS上的优化映射计算密集型循环加速

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a 4 x 4 CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II.

机译：粗粒可重构阵列（CGRAS）是新兴加速器，其在应用中承诺低功耗加速。 CGRA实现的加速度依赖于CGRA编译器对CGRA架构的计算密集环循环的有效映射。 CGRA映射问题是NP-Cleante，在两步处理中，即调度和映射进行。调度算法将时隙分配给数据流图的节点，映射算法将调度节点映射到CGRA的处理元件上。在映射失败上，提高启动间隔（II），并且获得了增加的II。最先前的映射技术使用迭代模数调度（IMS）算法来查找给定II的计划。由于IMS生成资源受限于尽可能的（ASAP）调度，即使增加II，它也倾向于生成不可映射的类似计划。因此，IMS没有有效地探索时间表空间。为了解决这些问题，本文通过随机IMS提出了Crimson，Compute-ConsteLy Loop加速和优化的映射技术，通过探索时间表来生成随机模数调度，从而在给定和增加的II处创建不同的模数时间表。 CISHSON在调度后，还采用了一种新颖的保守测试，以修剪不可映射的有效时间表。从我们的研究到前24位性能关键循环（从Mibench，Rodinia和Parboil中获取超过7％），我们发现以前使用IMS的最先进的方法，如斜坡和格子摩尔分别无法映射五个和七个循环，在4 x 4 cgra上，而Crimson能够映射它们。对于通过先前方法映射的环，Crimson实现了类似的II。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2020年第11期|3300-3310|共11页
作者
Balasubramanian Mahesh; Shrivastava Aviral;
展开▼
作者单位

Arizona State Univ Sch Comp Informat Decis & Syst Engn Tempe AZ 85287 USA;

Arizona State Univ Sch Comp Informat Decis & Syst Engn Tempe AZ 85048 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Coarse-grained reconfigurable arrays (CGRAs); compiler; modulo scheduling; randomized scheduling;

机译：粗粒度可重构阵列（CGRA）;编译器;模数调度;随机调度;

相似文献

外文文献
中文文献
专利

1. Joint Modulo Scheduling and Vdd Assignment for Loop Mapping on Dual- Vdd CGRAs [J] . Shouyi Yin, Jiangyuan Gu, Dajiang Liu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2016,第9期

机译：双Vdd CGRA上的环路映射联合模调度和Vdd分配
2. Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs [J] . Shouyi YIN, Dajiang LIU, Leibo LIU, IEICE transactions on information and systems . 2013,第8期

机译：用于CGRA上映射环嵌套的通信和重新配置优化的仿射变换
3. Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs [J] . Shouyi YIN, Member Dajiang LIU, Leibo LIU, IEICE Transactions on Information and Systems . 2013,第8期

机译：用于CGRA上映射环嵌套的通信和重构优化的仿射变换
4. Iterative Modulo Scheduling: An Algorithm For Software Pipelining Loops [C] . Rau, B.R. . 2001

机译：迭代模调度：一种用于软件流水线循环的算法
5. Affine loop optimization based on modulo unrolling in Chapel [D] . Sharma, Aroon 2014

机译：教堂中基于模展开的仿射循环优化
6. A randomized phase 3 study on the optimization of the combination of bevacizumab with FOLFOX/OXXEL in the treatment of patients with metastatic colorectal cancer-OBELICS (Optimization of BEvacizumab scheduLIng within Chemotherapy Scheme) [O] . Antonio Avallone, Maria Carmela Piccirillo, Luigi Aloj, 2016

机译：贝伐单抗与FOLFOX / OXXEL联合优化治疗转移性结直肠癌患者的3期随机研究-OBELICS（化学疗法方案中贝伐单抗方案的优化）
7. CGRA MODULO SCHEDULING FOR ACHIEVING BETTER PERFORMANCE AND INCREASED EFFICIENCY [O] . Siva Sankara Phani.T Et. al. 2021

机译：CGRA Modulo调度，用于实现更好的性能和提高效率

CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

摘要

著录项

相似文献

相关主题

期刊订阅