Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

Shouyi Yin; Xinhan Lin; Leibo Liu; Shaojun Wei

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

【24h】

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

机译：在粗粒度可重构体系结构上利用不完美嵌套循环的并行性

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Coarse-grained reconfigurable architecture (CGRA) is a promising parallel computing platform that provides high performance, high power efficiency and flexibility. However, for imperfect nested loops, the existing loop mapping methods often result in low execution performance and poor hardware utilization. To tackle this problem, this paper makes three contributions: 1) a highly effective and general approach to map imperfect loops on CGRA; 2) a global optimization strategy to search the optimal initiation intervals (IIs); 3) a powerful kernel compression method to reduce the oversized kernel. Experiment results show that our approach can reduce the total computing latency by 20.5, 58.5 and 73.2 percent compared to the state-of-the-art approaches on 2×2 , 4×4 and 8×8 CGRA respectively. Moreover, the compilation time and configuration context size is acceptable in practice.

机译：粗粒度可重构体系结构（CGRA）是一个有前途的并行计算平台，可提供高性能，高能效和灵活性。但是，对于不完善的嵌套循环，现有的循环映射方法通常会导致执行性能低下和硬件利用率低下。为了解决这个问题，本文做出了三点贡献：1）一种在CGRA上映射不完美回路的高效通用方法； 2）搜索最佳启动间隔（II）的全局优化策略； 3）强大的内核压缩方法，可减少过大的内核。实验结果表明，与最新的2×2、4×4和8×8 CGRA方法相比，我们的方法可以将总计算延迟减少20.5％，58.5％和73.2％。而且，编译时间和配置上下文大小在实践中是可以接受的。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2016年第11期|3199-3213|共15页
作者
Shouyi Yin; Xinhan Lin; Leibo Liu; Shaojun Wei;
展开▼
作者单位

Institute of Microelectronics, Tsinghua University, Beijing, China;

Institute of Microelectronics, Tsinghua University, Beijing, China;

National laboratory for Information, Science and Technology, Institute of Microelectronics, Tsinghua University, Beijing, China;

Institute of Microelectronics, Tsinghua University, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Pipeline processing; Kernel; Context; Computer architecture; Field programmable gate arrays;

机译：管道处理;内核;上下文;计算机体系结构;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures [J] . Hyeonuk Sim, Hongsik Lee, Seongseok Seo, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2016,第7期

机译：将不完美的循环映射到粗粒度的可重构体系结构
2. Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures [J] . Yin Shouyi, Liu Dajiang, Peng Yu, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2016,第2期

机译：改进粗粒度可重构体系结构上的嵌套循环流水线
3. Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures [J] . Liu Dajiang, Yin Shouyi, Peng Yu, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2015,第11期

机译：为粗粒度可重构体系结构优化嵌套循环的空间映射
4. Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures [C] . Xinhan Lin, Shouyi Yin, Leibo Liu, Asia and South Pacific Design Automation Conference . 2016

机译：在粗粒度可重构体系结构上利用同级内部循环利用不完善的嵌套循环的并行性
5. Exploiting Thread-Level Parallelism on Reconfigurable Architectures: a Cross-Layer Approach [D] . Momeni, Amir. 2017

机译：在可重构体系结构上利用线程级并行性：一种跨层方法
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling [O] . Bingfeng Mei, Serge Vernalde, Diederik Verkest, 2003

机译：利用模调度在粗粒度可重构体系结构上利用循环级并行

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅