首页> 外文会议>International Symposium on Microarchitecture >TAPAS: Generating Parallel Accelerators from Parallel Programs
【24h】

TAPAS: Generating Parallel Accelerators from Parallel Programs

机译:TAPAS:从并行程序生成并行加速器

获取原文

摘要

High-level-synthesis (HLS) tools generate accelerators from software programs to ease the task of building hardware. Unfortunately, current HLS tools have limited support for concurrency, which impacts the speedup achievable with the generated accelerator. Current approaches only target fixed static patterns (e.g., pipeline, data-parallel kernels). This constraints the ability of software programmers to express concurrency. Moreover, the generated accelerator loses a key benefit of parallel hardware, dynamic asynchrony, and the potential to hide long latency and cache misses. We have developed TAPAS, an HLS toolchain for generating parallel accelerators from programs with dynamic parallelism. TAPAS is built on top of Tapir [22], [39], which embeds fork-join parallelism into the compiler's intermediate-representation. TAPAS leverages the compiler IR to identify parallelism and synthesizes the hardware logic. TAPAS provides first-class architecture support for spawning, coordinating and synchronizing tasks during accelerator execution. We demonstrate TAPAS can generate accelerators for concurrent programs with heterogeneous, nested and recursive parallelism. Our evaluation on Intel-Altera DE1-SoC and Arria-10 boards demonstrates that TAPAS generated accelerators achieve 20× the power efficiency of an Intel Xeon, while maintaining comparable performance. We also show that TAPAS enables lightweight tasks that can be spawned in '10 cycles and enables accelerators to exploit available fine-grain parallelism. TAPAS is a complete HLS toolchain for synthesizing parallel programs to accelerators and is open-sourced.
机译:高级别合成(HLS)工具从软件程序生成加速器,以简化构建硬件的任务。不幸的是,当前的HLS工具对并发性的支持有限,这会影响所产生的加速器可实现的加速。目前的方法仅目标固定静态模式(例如,管道,数据并行内核)。这限制了软件程序员表达并发的能力。此外,所生成的加速器会失去并行硬件,动态异步的关键效益,以及隐藏长期延迟和缓存未命中的可能性。我们开发了Tapas,一个HLS工具链,用于从具有动态并行性的程序生成并行加速器。 Tapas是基于Tap的顶部[22],[39],它将Fork-JoinParpastIsm嵌入编译器的中间表示。 Tapas利用编译器IR识别并行性并合成硬件逻辑。 Tapas为在加速器执行期间提供了一个用于产卵,协调和同步任务的一流架构支持。我们演示了Tapas可以生成加速器,以便具有异构,嵌套和递归行度的并行程序。我们对英特尔 - Altera De1-SoC和Arria-10板的评估表明,Tapas产生的加速器达到了英特尔Xeon的功率效率,同时保持了可比的性能。我们还表明,TAPAS可以在“10个周期”中可以生成的轻量级任务,并使加速器能够利用可用的细粒行度。 Tapas是一个完整的HLS工具链,用于将并行程序合成到加速器,并开放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号