This paper introduces a novel methodology to optimize coarse-grained floating point units (FPUs) in a hybrid FPGA. We employ common subgraph extraction to determine the number of floating point adders/subtracters (FAs), multipliers (FMs) and wordblocks (WBs) in the FPUs. We flrst study the area, speed and utilization trade-off of the selected FPU subgraphs in a set of floating point benchmark circuits. We then explore the impact of density and flexibility of FPUs on the system in terms of area, speed and routing resources. We derive an optimized coarse-grained FPU by considering both architectural and system level issues. The results show that: (1) embedding more types of coarse-grained FPU in the system causes at most 21.3% increase in delay, (2) the area of the system can be reduced by 27.4% by embedding high density subgraphs, (3) the high density subgraphs requires 14.8% fewer routing resources.
展开▼