首页> 外文会议>IEEE International Conference on Application-specific Systems, Architectures and Processors >Real-time object detection in software with custom vector instructions and algorithm changes
【24h】

Real-time object detection in software with custom vector instructions and algorithm changes

机译:具有自定义矢量指令和算法的软件中的实时对象检测

获取原文

摘要

Real-time vision applications place stringent performance requirements on embedded systems. To meet performance requirements, embedded systems often require hardware implementations. This approach is unfavorable as hardware development can be difficult to debug, time-consuming, and require extensive skill. This paper presents a case study of accelerating face detection, often part of a complex image processing pipeline, using a software/hardware hybrid approach. As a baseline, the algorithm is initially run on a scalar ARM Cortex-A9 application processor found on a Xilinx Zynq device. Next, using a previously designed vector engine implemented in the FPGA fabric, the algorithm is vectorized, using only standard vector instructions, to achieve a 25× speedup. Then, we accelerate the critical inner loops by adding two hardware-assisted custom vector instructions for an additional 10× speedup, yielding 248× speedup over the initial Cortex-A9 baseline. Collectively, the custom instructions require fewer than 800 lines of VHDL code, including comments and blank lines. Compared to previous hardware-only face detection systems, our work is 1.5 to 6.8 times faster. This approach demonstrates that good performance can be obtained from software-only vectorization, and a small amount of custom hardware can provide a significant acceleration boost.
机译:实时视觉应用程序对嵌入式系统进行严格的性能要求。为了满足性能要求,嵌入式系统通常需要硬件实现。这种方法是不利的,因为硬件开发可能难以调试,耗时,并且需要大量技能。本文采用软件/硬件混合方法,提出了一种加速面部检测,通常是复杂图像处理管道的一部分的案例研究。作为基线,算法最初在Xilinx Zynq设备上找到的标量ARM Cortex-A9应用程序处理器运行。接下来,使用在FPGA结构中实现的先前设计的矢量引擎,算法仅使用标准矢量指令,实现25倍的加速。然后,通过添加两个硬件辅助自定义向量指令,增加了10倍的加速度,通过初始Cortex-A9基线加速了248×加速器来加速临界内环。统称,自定义指令需要少于800行的VHDL代码,包括注释和空行。与以前的硬件脸部检测系统相比,我们的工作速度更快1.5至6.8倍。这种方法表明,可以从唯一的软件矢量化获得良好的性能,并且少量的定制硬件可以提供显着的加速度提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号