Digital-signal-processing (DSP) systems often rely on hardware accelerators and high-speed function interpolators to meet the throughput and power requirements of modern applications such as wireless communication, portable multimedia players, and high-performance graphics processing. Truncated-matrix multipliers and squarers are arithmetic units in which some of the least-significant columns of partial-product bits are not formed. Such units offer the trade-off of improved area, delay, and power, at the expense of computational accuracy. Since many DSP algorithms are multiply and/or square intensive, and can tolerate some additional computational error, truncated-matrix units are an attractive design option. In spite of this, little work is published on the system-level use and optimization of truncated-matrix multipliers or squarers in DSP systems.;This dissertation presents methods for using truncated-matrix multipliers and squarers in high-performance DSP hardware that allow area, delay, and power benefits to be achieved without compromising the quality of the system output due to error. Finite-impulse-response (FIR) filters, two-dimension discrete cosine transform and inverse discrete cosine transform (2-D DCT and IDCT) hardware accelerators, and function interpolators are studied. Unlike previous research, which only looks at the unit level, a system-level approach is taken to reduce the overall error of the system output. By taking a system-level approach, the output of the system can be improved significantly compared to only using unit-level techniques. System-level techniques including coefficient shifting, system-level constant correction, and error apportioning are developed.;Results show that significant reductions in area and power can be realized for each of these systems while maintaining acceptable error characteristics. FIR filters are shown to have signal-to-noise ratios and frequency responses nearly identical to filters using standard multipliers, while using truncated-matrix multipliers that require approximately 35% less area. DCT and IDCT hardware accelerators using truncated-matrix multipliers with up to 44% less area and 44% less power are shown to compress and decompress images that are indistinguishable from images processed using standard multipliers. The computational portion of quadratic function interpolators designed with +/-1 unit in the last place (ulp) accuracy are shown to require up to 34% less area and 25% less power when modified to use truncated-matrix multipliers and squarers, while maintaining +/-1 ulp accuracy. A thorough survey of existing techniques at the unit level is given, complete with detailed error analysis and synthesis estimates. Software tools are developed that perform fast, bit-accurate simulation and generate structural Verilog models for synthesis. These tools enable further research and conversion of existing designs to use truncated-matrix multipliers and squarers.
展开▼