The Discrete Cosine Transform (DCT) is used in place of the Discrete Fourier Transform (DFT) in a wide variety of audio and image processing applications due to its energy compaction properties which approach those of the optimal Karhunen-Love transform. Previous work in recon-figurable hardware has focused on implementations of 2D 8×8 transforms of the type commonly used in the JPEG and MPEG standards. Several applications for larger DCTs exist, including those involving the extraction of features from image data, solving partial differential equations (PDEs) and those utilizing the Preconditioned Conjugate Gradient (PCG) method such as phase-unwrapping. This paper presents an indirect algorithm and implementation on a Xilinx FPGA that performs 1D DCTs on large block sizes using a block floating point format. The DCT was designed to use fewer resources than other popular approaches due to the larger point sizes supported which would otherwise consume all available chip area, but at the cost of higher latency. This latency is similar to that required for an identically sized FFT. A 512-point DCT has been shown to take 1771 cycles or 13.3 us at 133 MHz as compared to a similarly sized FFT that takes 1757 cycles or 13.2 us (including all component transfer times).
展开▼