A High-Level Synthesis Approach Optimizing Accumulations in Floating-Point Programs Using Custom Formats and Operators

机译：使用自定义格式和运算符优化浮点程序中累积的高级综合方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many case studies have demonstrated the potential of Field-Programmable Gate Arrays (FPGAs) as accelerators for a wide range of applications. FPGAs offer massive parallelism and programmability at the bit level. This enables programmers to exploit a range of techniques that avoid many bottlenecks of classical von Neumann computing. However, development costs for FPGAs are orders of magnitude higher than classical programming. A solution would be the use of High-Level Synthesis (HLS) tools, which use C as a hardware description language. However, the C language was designed to be executed on general purpose processors, not to generate hardware. Its datatypes and operators are limited to a small number (more or less matching the hardware operators present in mainstream processors), and HLS tools inherit these limitations. To better exploit the freedom offered by hardware and FPGAs, HLS vendors have enriched the C language with integer and fixed-point types of arbitrary size. Still, the operations on these types remain limited to the basic arithmetic and logic ones. In floating point, the current situation is even worse. The operator set is limited, and the sizes are restricted to 32 and 64 bits. Besides, most recent compilers, including the HLS ones, attempt to follow established standards, in particular C11 and IEEE-754. This ensures bit-exact compatibility with software, but greatly reduces the freedom of optimization by the compiler. For instance, a floating point addition is not associative even though its real equivalent is. In the present work we attempt to give the compiler more freedom. For this, we sacrifice the strict respect of the IEEE-754 and C11 standards, but we replace it with the strict respect of a high-level accuracy specification expressed by the programmer through a pragma. The case study in this work is a program transformation that applies to floating-point additions on a loop's critical path. It decomposes them into elementary steps, resizes the corresponding subcomponents to guarantee some user-specified accuracy, and merges and reorders these components to improve performance. The result of this complex sequence of optimizations could not be obtained from an operator generator, since it involves global loop information. For this purpose, we used a compilation flow involving one or several source-to-source transformations operating on the code given to HLS tools (Figure 1).The proposed transformation already works very well on 3 of the 10 FPMarks where it improves both latency and accuracy by an order of magnitude for comparable area. For 2 more benchmarks, the latency is not improved (but not degraded either) due to current limitations of HLS tools. This defines short-term future work. The main result of this work is that HLS tools also have the potential to generate efficient designs for handling floating-point computations in a completely non-standard way. In the longer term, we believe that HLS flows can not only import application-specific operators from the FPGA literature, they can also improve them using high-level, program-level information.

机译：许多案例研究证明了现场可编程门阵列（FPGA）作为广泛应用中的加速器的潜力。 FPGA在位级别提供了巨大的并行性和可编程性。这使程序员能够利用一系列技术来避免经典冯·诺依曼计算的许多瓶颈。但是，FPGA的开发成本比传统编程要高几个数量级。一种解决方案是使用高级综合（HLS）工具，该工具使用C作为硬件描述语言。但是，C语言被设计为在通用处理器上执行，而不是生成硬件。它的数据类型和运算符被限制为少数（或多或少与主流处理器中存在的硬件运算符相匹配），并且HLS工具继承了这些限制。为了更好地利用硬件和FPGA提供的自由，HLS供应商使用任意大小的整数和定点类型丰富了C语言。尽管如此，对这些类型的运算仍然仅限于基本的算术和逻辑运算。在浮点上，当前的情况甚至更糟。运算符集受到限制，并且大小限制为32位和64位。此外，包括HLS在内的大多数最新编译器都试图遵循既定标准，尤其是C11和IEEE-754。这样可确保与软件的位精确兼容性，但大大降低了编译器进行优化的自由度。例如，即使浮点加法的真正等效项是关联的，它也不是关联的。在当前的工作中，我们试图赋予编译器更多的自由。为此，我们牺牲了对IEEE-754和C11标准的严格尊重，但我们以程序员通过实用性表示的高级准确性规范的严格尊重代替了它。这项工作中的案例研究是一种程序转换，适用于循环关键路径上的浮点加法运算。它将它们分解为基本步骤，调整相应子组件的大小以确保某些用户指定的准确性，并对这些组件进行合并和重新排序以提高性能。无法从运算符生成器中获得此复杂的优化序列的结果，因为它涉及全局循环信息。为此，我们使用了一个编译流程，其中涉及一个或多个源到源转换，这些转换是对提供给HLS工具的代码进行操作的（图1）。建议的转换已经在10个FPMark中的3个上表现良好，可以改善两个延迟和可比面积的精度提高了一个数量级。对于另外2个基准，由于HLS工具的当前限制，延迟没有得到改善（但也没有降低）。这定义了短期的未来工作。这项工作的主要结果是，HLS工具还具有生成高效设计的潜力，从而可以以完全非标准的方式处理浮点计算。从长远来看，我们认为HLS流不仅可以从FPGA文献中导入特定于应用程序的运算符，而且还可以使用高级程序级信息来改进它们。

著录项

来源
《IEEE Annual International Symposium on Field-Programmable Custom Computing Machines》|2017年|80-80|共1页
会议地点
作者
Yohann Uguen; Florent de Dinechin; Steven Derrien;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Field programmable gate arrays; Tools; Hardware; Optimization; C languages; Program processors; Standards;

机译：现场可编程门阵列;工具;硬件;优化; C语言;程序处理器;标准;

相似文献

外文文献
中文文献
专利

1. High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication [J] . Erik H. DHollander Computer architecture news . 2016,第4期

机译：块浮点矩阵乘法的高级综合优化
2. Automatic source-to-source error compensation of floating-point programs: code synthesis to optimize accuracy and time [J] . Laurent Thévenoux, Philippe Langlois, Matthieu Martel Journal of Software Maintenance and Evolution . 2017,第7期

机译：浮点程序的源到源错误自动补偿：代码合成可优化准确性和时间
3. Automatic source-to-source error compensation of floating-point programs: code synthesis to optimize accuracy and time [J] . Laurent Thévenoux, Philippe Langlois, Matthieu Martel Concurrency and Computation . 2017,第7期

机译：浮点程序的源到源错误自动补偿：代码合成可优化准确性和时间
4. A High-Level Synthesis Approach Optimizing Accumulations in Floating-Point Programs Using Custom Formats and Operators [C] . Yohann Uguen, Florent de Dinechin, Steven Derrien IEEE Annual International Symposium on Field-Programmable Custom Computing Machines . 2017

机译：使用自定义格式和运算符优化浮点程序中累积的高级合成方法
5. A rigorous framework for fully supporting the IEEE standard for floating-point arithmetic in high-level programming languages. [D] . Figueroa del Cid, Samuel Arturo. 2000

机译：严格支持高级编程语言中浮点运算的IEEE标准的严格框架。
6. Customized Peptide Biomaterial Synthesis via an Environment-Reliant Auto-Programmer Stigmergic Approach [O] . Ravindra V. Badhe, Pradeep Kumar, Yahya E. Choonara, 2018

机译：通过环境依赖的自动编程器Stigmergic方法定制的肽生物材料合成
7. A high-level synthesis approach optimizing accumulations in floating-point programs using custom formats and operators [O] . Uguen, Yohann, de Dinechin, Florent, Derrien, Steven 2017

机译：一种高级综合方法，使用自定义格式和运算符优化浮点程序中的累积
8. Formatting Programs for Simplifying Data Reduction with the Pdp-8 Floating-Point System [R] . Antal, J. J. 1968

机译：使用pdp-8浮点系统简化数据缩减的格式化程序

A High-Level Synthesis Approach Optimizing Accumulations in Floating-Point Programs Using Custom Formats and Operators

摘要

著录项

相似文献

相关主题

期刊订阅