首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >Grater: An approximation workflow for exploiting data-level parallelism in FPGA acceleration
【24h】

Grater: An approximation workflow for exploiting data-level parallelism in FPGA acceleration

机译:Grater:在FPGA加速中利用数据级并行性的近似工作流程

获取原文

摘要

Modern applications including graphics, multimedia, web search, and data analytics not only can benefit from acceleration, but also exhibit significant degrees of tolerance to imprecise computation. This amenability to approximation provides an opportunity to trade quality of the results for higher performance and better resource utilization. Exploiting this opportunity is particularly important for FPGA accelerators that are inherently subject to many resource constraints. To better utilize the FPGA resources, we devise, Grater, an automated design workflow for FPGA accelerators that leverages imprecise computation to increase data-level parallelism and achieve higher computational throughput. The core of our workflow is a source-to-source compiler that takes in an input kernel and applies a novel optimization technique that selectively reduces the precision of kernel's data and operations. By selectively reducing the precision of the data and operation, the required area to synthesize the kernels on the FPGA decreases allowing to integrate a larger number of operations and parallel kernels in the fixed area of the FPGA. The larger number of integrated kernels provides more hardware context to better exploit data-level parallelism in the target applications. To effectively explore the possible design space of approximate kernels, we exploit a genetic algorithm to find a subset of safe-to-approximate operations and data elements and then tune their precision levels until the desired output quality is achieved. GRATER exploits a fully software technique and does not require any changes to the underlying FPGA hardware. We evaluate Grater on a diverse set of data-intensive OpenCL benchmarks from the AMD SDK. The synthesis result on a modern Altera FPGA shows that our approximation workflow yields 1.4?????3.0?? higher throughput with less than 1% quality loss.
机译:包括图形,多媒体,网络搜索和数据分析在内的现代应用程序不仅可以受益于加速,而且还展示了对不精确的计算的显着耐受程度。这种近似的可享受性为贸易质量提供了更高的性能和更好的资源利用的机会。利用本机的机会对于本质上受到许多资源限制的FPGA加速器尤为重要。为了更好地利用FPGA资源,我们设计,刨丝器,用于FPGA加速器的自动化设计工作流程,它利用不精确的计算来增加数据级并行性并实现更高的计算吞吐量。我们的工作流程的核心是一个源到源编译器,它采用输入内核,并应用一种新颖的优化技术,可选择地降低内核的数据和操作的精度。通过选择性地减少数据和操作的精度,在FPGA上合成核的所需区域减少了允许在FPGA的固定区域中集成更多数量的操作和并行核。较大数量的集成内核提供了更多的硬件上下文,以更好地利用目标应用程序中的数据级并行性。为了有效地探索近似内核的可能设计空间,我们利用遗传算法来查找安全到近似操作和数据元素的子集,然后调整它们的精度级别,直到实现所需的输出质量。刨丝器利用完全软件技术,不需要对底层FPGA硬件进行任何更改。我们评估来自AMD SDK的各种数据密集型OpenCl基准的刨丝器。现代化的Altera FPGA上的合成结果显示我们的近似工作流量1.4 ????? 3.0 ??吞吐量较高,质量损失低于1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号