Speeded Up Robust Feature(SURF) is widely used in computer vision applications. In many recent applications like mobile devices and vision sensor network, it is extremely difficult to meet both the performance and power consumption requirements of SURF implementations, especially for CPU, GPU, DSP or FPGA based solutions. In this paper, the SURF algorithm is simplified and optimized for hardware implementation. To increase the throughput, procedures like orientation assignment and descriptor extraction are re-organized while maintaining enough accuracy; the memory accesses have also been improved to increase the bandwidth and reduce repeated data accesses; the workload of each stage in the pipeline is analyzed and balanced to reduce the pipeline bubble. Furthermore, a method called Word Length Reduction (WLR) is adopted to compress the integral image, which reduces the on-chip memory by 40%. In addition to that, the corresponding power consumptions are reduced significantly. The Simplified SURF is implemented onto a 3.4×4.0 mm2 chip called SURFEX using TSMC 65nm process. The chip is able to process 57 frames of 1080p(1920×1080) video per second with a 200MHz working frequency while dissipating 220mW. This throughput is 6 times of the ones reported in the latest literatures and the power consumption is less than half of the most outstanding implementations.
展开▼
机译:加速鲁棒功能(SURF)广泛用于计算机视觉应用。在移动设备和视觉传感器网络等许多最新应用中,要同时满足SURF实现的性能和功耗要求非常困难,尤其是对于基于CPU,GPU,DSP或FPGA的解决方案而言。本文对SURF算法进行了简化和优化,以实现硬件。为了增加吞吐量,重新组织了诸如方向分配和描述符提取之类的过程,同时保持了足够的准确性。存储器访问也得到了改进,以增加带宽并减少重复的数据访问;分析并平衡管道中每个阶段的工作量,以减少管道泡沫。此外,采用一种称为字长减少(WLR)的方法来压缩积分图像,从而将片上存储器减少40%。除此之外,相应的功率消耗显着降低。简化的SURF采用台积电65nm工艺在3.4×4.0 mm 2 sup>芯片SURFEX上实现。该芯片能够以200MHz的工作频率每秒处理57帧1080p(1920×1080)视频,而耗散220mW。该吞吐量是最新文献报道的吞吐量的6倍,并且功耗不到最出色实现的一半。
展开▼