A barrier to wide-spread use of Field Programmable Gate Arrays (FPGAs) has been the complexity of programming, but recent advances in High-Level Synthesis (HLS) have made it possible for non-experts to easily create floating-point numerical accelerators from C-like code. However, HLS users are limited to the set of numerical primitives provided by HLS vendors and designers of floating-point IP cores, and cannot easily implement new fast or accurate numerical primitives. This paper presents a method for automatically creating high-performance pipelined floating-point function approximations, which can be integrated as IP cores into numerical accelerators, whether derived from HLS or traditional design methods. Both input and output are floating-point, but internally the function approximator uses fixed-point polynomial segments, guaranteeing a faithfully rounded output. A robust and automated non-uniform segmentation scheme is used to segment any twice-differentiable input function and produce platform-independent VHDL. The approach is demonstrated across ten functions, which are automatically generated then placed and routed in Xilinx devices. The method provides a 1.1x-3x improvement in area over composite numerical approximations, while providing similar performance and significantly better relative error.
展开▼