Real-time object detection in software with custom vector instructions and algorithm changes

机译：具有自定义矢量指令和算法的软件中的实时对象检测

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Real-time vision applications place stringent performance requirements on embedded systems. To meet performance requirements, embedded systems often require hardware implementations. This approach is unfavorable as hardware development can be difficult to debug, time-consuming, and require extensive skill. This paper presents a case study of accelerating face detection, often part of a complex image processing pipeline, using a software/hardware hybrid approach. As a baseline, the algorithm is initially run on a scalar ARM Cortex-A9 application processor found on a Xilinx Zynq device. Next, using a previously designed vector engine implemented in the FPGA fabric, the algorithm is vectorized, using only standard vector instructions, to achieve a 25× speedup. Then, we accelerate the critical inner loops by adding two hardware-assisted custom vector instructions for an additional 10× speedup, yielding 248× speedup over the initial Cortex-A9 baseline. Collectively, the custom instructions require fewer than 800 lines of VHDL code, including comments and blank lines. Compared to previous hardware-only face detection systems, our work is 1.5 to 6.8 times faster. This approach demonstrates that good performance can be obtained from software-only vectorization, and a small amount of custom hardware can provide a significant acceleration boost.

机译：实时视觉应用程序对嵌入式系统进行严格的性能要求。为了满足性能要求，嵌入式系统通常需要硬件实现。这种方法是不利的，因为硬件开发可能难以调试，耗时，并且需要大量技能。本文采用软件/硬件混合方法，提出了一种加速面部检测，通常是复杂图像处理管道的一部分的案例研究。作为基线，算法最初在Xilinx Zynq设备上找到的标量ARM Cortex-A9应用程序处理器运行。接下来，使用在FPGA结构中实现的先前设计的矢量引擎，算法仅使用标准矢量指令，实现25倍的加速。然后，通过添加两个硬件辅助自定义向量指令，增加了10倍的加速度，通过初始Cortex-A9基线加速了248×加速器来加速临界内环。统称，自定义指令需要少于800行的VHDL代码，包括注释和空行。与以前的硬件脸部检测系统相比，我们的工作速度更快1.5至6.8倍。这种方法表明，可以从唯一的软件矢量化获得良好的性能，并且少量的定制硬件可以提供显着的加速度提升。

著录项

来源
《IEEE International Conference on Application-specific Systems, Architectures and Processors》|2017年|215p|共8页
会议地点
作者
Joe Edwards; Guy G.F. Lemieux;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP303-53;
关键词
Hardware; Engines; Object detection; Face detection; Field programmable gate arrays; Face;

机译：硬件;发动机;物体检测;面部检测;现场可编程门阵列;脸部;

相似文献

外文文献
中文文献
专利

1. Realizing real-time centroid detection of multiple objects with marching pixels algorithms on programmable customizing hardware [J] . Dietmar Fey, Marc Reichenbach, Marcus Komann, Concurrency and computation: practice and experience . 2012,第16期

机译：通过可编程定制硬件上的行进像素算法实现多个对象的实时质心检测
2. High-precision and real-time algorithms of multi-object detection, recognition and localization toward ARVD of cooperative spacecrafts [J] . Zhang Yingjin, Qin Shiyin, Hu Xiaohui Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2015,第24期

机译：协作航天器多目标检测，识别和向ARVD定位的高精度实时算法
3. A Parallel Hardware Architecture for Real-Time Object Detection with Support Vector Machines [J] . Kyrkou C. Computers, IEEE Transactions on . 2012,第6期

机译：支持向量机的实时目标检测并行硬件架构
4. Real-time object detection in software with custom vector instructions and algorithm changes [C] . Joe Edwards, Guy G.F. Lemieux International Conference on Application-specific Systems, Architectures and Processors . 2017

机译：具有自定义矢量指令和算法更改的软件中的实时对象检测
5. A near real-time, highly scalable, parallel and distributed adaptive object detection and re-training framework based on the AdaBoost algorithm [D] . Abualkibash, Munther 2015

机译：基于AdaBoost算法的近实时，高度可扩展，并行和分布式的自适应对象检测和再训练框架
6. Real-time Concealed Object Detection from Passive Millimeter Wave Images Based on the YOLOv3 Algorithm [O] . Lei Pang, Hui Liu, Yang Chen, 2020

机译：基于YOLOv3算法的被动毫米波图像实时隐藏目标检测。
7. Implementation of moving object detection using angle change method This paper conclude the work in terms of input and output parameter that has been considered while detecting the moving object in video surveillance. It also provides with a look up of our work area. This paper presents a discussion of various techniques like Background subtraction, Block matching, Frame differencing, Optical Flow used in motion detection and also include the comparative analysis of two algorithms first is Colour method(Space vector difference method) and second is angle change method. [O] . 2017

机译：使用角度变化方法的移动物体检测的实现本文在检测视频监控中检测到移动对象的同时考虑的输入和输出参数方面的工作结束。它还提供了我们工作区的抬头。本文提出了与背景减法，块匹配，帧差异，在运动检测中使用的光流等来的各种技术的讨论，并且还包括两个算法的比较分析首先是颜色方法（空间向量差法）和第二是角度变化方法。

Real-time object detection in software with custom vector instructions and algorithm changes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅