首页> 外文会议>International conference on architecture of computing systems;ARCS 2009 >Parallelization Approaches for Hardware Accelerators - Loop Unrolling Versus Loop Partitioning
【24h】

Parallelization Approaches for Hardware Accelerators - Loop Unrolling Versus Loop Partitioning

机译:硬件加速器的并行化方法-循环展开与循环分区

获取原文

摘要

State-of-the-art behavioral synthesis tools barely have high-level transformations in order to achieve highly parallelized implementations. If any, they apply loop unrolling to obtain a higher throughput. In this paper, we employ the PARO behavioral synthesis tool which has the unique ability to perform both loop unrolling or loop partitioning. Loop unrolling replicates the loop kernel and exposes the parallelism for hardware implementation, whereas partitioning tiles the loop program onto a regular array consisting of tightly coupled processing elements. The usage of the same design tool for both the variants enables for the first time, a quantitative evaluation of the two approaches for reconfigurable architectures with help of computationally intensive algorithms selected from different benchmarks. Superlinear speedups in terms of throughput are accomplished for the processor array approach. In addition, area and power cost are reduced.
机译:最新的行为综合工具几乎没有高级转换即可实现高度并行化的实现。如果有的话,它们会应用循环展开以获得更高的吞吐量。在本文中,我们采用了PARO行为综合工具,该工具具有执行循环展开或循环分区的独特功能。循环展开将复制循环内核并公开用于硬件实现的并行性,而分区将循环程序平铺到由紧密耦合的处理元素组成的常规数组中。两种变体使用相同的设计工具首次实现了对可重构体系结构的两种方法的定量评估,并借助从不同基准中选择的计算密集型算法进行了评估。对于处理器阵列方法,可以实现吞吐量方面的超线性加速。另外,减少了面积和电力成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号