首页> 外文会议>International Conference on Application-specific Systems, Architectures and Processors >Real-time object detection in software with custom vector instructions and algorithm changes
【24h】

Real-time object detection in software with custom vector instructions and algorithm changes

机译:具有自定义矢量指令和算法更改的软件中的实时对象检测

获取原文

摘要

Real-time vision applications place stringent performance requirements on embedded systems. To meet performance requirements, embedded systems often require hardware implementations. This approach is unfavorable as hardware development can be difficult to debug, time-consuming, and require extensive skill. This paper presents a case study of accelerating face detection, often part of a complex image processing pipeline, using a software/hardware hybrid approach. As a baseline, the algorithm is initially run on a scalar ARM Cortex-A9 application processor found on a Xilinx Zynq device. Next, using a previously designed vector engine implemented in the FPGA fabric, the algorithm is vectorized, using only standard vector instructions, to achieve a 25× speedup. Then, we accelerate the critical inner loops by adding two hardware-assisted custom vector instructions for an additional 10× speedup, yielding 248× speedup over the initial Cortex-A9 baseline. Collectively, the custom instructions require fewer than 800 lines of VHDL code, including comments and blank lines. Compared to previous hardware-only face detection systems, our work is 1.5 to 6.8 times faster. This approach demonstrates that good performance can be obtained from software-only vectorization, and a small amount of custom hardware can provide a significant acceleration boost.
机译:实时视觉应用程序对嵌入式系统提出了严格的性能要求。为了满足性能要求,嵌入式系统通常需要硬件实现。这种方法是不利的,因为硬件开发可能难以调试,耗时且需要大量技能。本文介绍了使用软件/硬件混合方法加速人脸检测(通常是复杂图像处理管道的一部分)的案例研究。作为基准,该算法最初在Xilinx Zynq设备上的标量ARM Cortex-A9应用处理器上运行。接下来,使用先前在FPGA架构中设计的矢量引擎,仅使用标准矢量指令对算法进行矢量化,以实现25倍加速。然后,我们通过添加两个硬件辅助的自定义矢量指令来加速关键的内部循环,以实现额外的10倍加速,从而在初始Cortex-A9基线上获得248倍加速。总的来说,自定义指令需要少于800行的VHDL代码,包括注释和空白行。与以前的仅硬件面部检测系统相比,我们的工作速度提高了1.5到6.8倍。这种方法表明,仅通过软件进行矢量化可以获得良好的性能,少量的自定义硬件可以显着提高加速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号