In this paper, a computation-efficient implementation of the flexible triangle (FTS) search algorithm is presented. The FTS is a fast block-matching algorithm for motion estimation proposed in previous work. The FTS is used for block-based motion estimation where it can locate the best matching blocks between two frames using a search triangle of flexible size and orientation. This flexibility provide the triangle with the high efficiency to locate the best matching block in fewer number of search iterations. Further analyses of the FTS performance indicates that more computation efficiency can be achieved in loading the search area and the computing the matching criterion between two macroblocks. Adjacent search areas are overlapped and consequently, the loading mechanism can be modified to load only the non-overlapped sections In addition, loading of search area and other data can start earlier parallel with the algorithm initialization. Finally, the computed SAD results can be stored and then later used instead of re-computing them again. The proposed implementation in this paper reduces the average number of cycles required to finish one macroblock search by around 26% and thus enable the video encoder to support higher frequencies or larger resolutions. The proposed technique was implemented in FPGA as part of the flexible triangle search (FTS) motion estimation algorithm. The proposed design was implemented, simulated, and tested using VHDL and synthesized using Xilinx ISE for the Xilinx Spartan3 device. The results obtained were compared to an FPGA implementation of the FTS algorithm published in previous work.
展开▼