Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU

机译：基于张量的基于CUDA优化的嵌入式GPU上的并行加速用于神经网络推理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With image processing, robots acquired visual perception skills; enabling them to become autonomous. Since the emergence of Artificial Intelligence (AI), sophisticated tasks such as object identification have become possible through inferencing Artificial Neural Networks (ANN). Be that as it may. Autonomous Mobile Robots (AMR) are Embedded Systems (ESs) with limited on-board resources. Thus, efficient techniques in ANN inferencing are required for real-time performance. This paper presents the process of optimizing ANNs inferencing using tensor-based optimization on embedded Graphical Processing Unit (GPU) with Computer Unified Device Architecture (CUDA) platform for parallel acceleration on ES. This research evaluates renowned network, namely, You-Only-Look-Once (YOLO), on NVIDIA Jetson TX2 System-On-Module (SOM). The findings of this paper display a significant improvement in inferencing speed in terms of Frames-Per-Second (FPS) up to 3.5 times the non-optimized inferencing speed. Furthermore, the current CUDA model and TensorRT optimization techniques are studied, comments are made on its implementation for inferencing, and improvements are proposed based on the results acquired. These findings will contribute to ES developers and industries will benefit from real-lime performance inferencing for AMR automation solutions.

机译：通过图像处理，机器人获得了视觉感知技能;使他们能够自治。自人工智能（AI）出现以来，通过推断人工神经网络（ANN）即可完成诸如对象识别之类的复杂任务。是因为它可能。自主移动机器人（AMR）是车载资源有限的嵌入式系统（ES）。因此，实时性能需要ANN推理中的有效技术。本文介绍了使用基于张量的优化在嵌入式图形处理单元（GPU）上使用计算机统一设备体系结构（CUDA）平台在ES上进行并行加速来优化ANN推理的过程。这项研究评估了著名的网络，即NVIDIA Jetson TX2模块系统（SOM）上的“ You-Only-Look-Once（YOLO）”。本文的研究结果表明，推理速度有了显着提高，每秒帧数（FPS）高达未优化推理速度的3.5倍。此外，研究了当前的CUDA模型和TensorRT优化技术，对用于推理的实现进行了评论，并根据获得的结果提出了改进措施。这些发现将有助于ES开发人员，并且行业将从AMR自动化解决方案的实时石灰性能推断中受益。

著录项

来源
《IFIP WG 12.5 International workshops on artificial intelligence applications and innovations;Mining Humanistic Data Workshop;Workshop on 5G-Putting Intelligence to the Network Edge》|2020年|291-302|共12页
会议地点
作者
Ahmed Khamis Abdullah AI Ghadani; Waleeja Mateen; Rameshkumar G. Ramaswamy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Artificial Neural Networks; Embedded GPU; TensorRT; Real-time; NVIDIA Jetson; Image processing; YOLO; CUDA;

机译：人工神经网络;嵌入式GPU TensorRT;即时的; NVIDIA Jetson;图像处理; YOLO;卡达;

相似文献

外文文献
中文文献
专利

1. Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA [J] . J. Habich, T. Zeiser, G. Hager, Advances in Engineering Software . 2011,第5期

机译：使用CUDA在nVIDIA GPU上D3Q19晶格Boltzmann内核的性能分析和优化策略
2. A compiler approach to map algebra: automatic parallelization, locality optimization, and GPU acceleration of raster spatial analysis [J] . Jesús Caraba?o, Jan Westerholm, Tapani Sarjakoski Geoinformatica: An international journal of advances of computer science for geographic . 2018,第2期

机译：映射代数的编译方法：自动并行化，地区优化和光栅空间分析的GPU加速度
3. Acceleration of phase-field lattice Boltzmann simulation of dendrite growth with thermosolutal convection by the multi-GPUs parallel computation with multiple mesh and time step method [J] . Sakane Shinji, Takaki Tomohiro, Ohno Munekazu, Modelling and simulation in materials science and engineering . 2019,第5期

机译：用多GPU并行计算与多网格和时间步骤方法的多GPU并行计算枝晶晶格Boltzmann模拟枝晶叶片模拟
4. Combining Task- and Data-Level Parallelism for High-Throughput CNN Inference on Embedded CPUs-GPUs MPSoCs [C] . Svetlana Minakova, Erqian Tang, Todor Stefanov International conference on embedded computer systems: architectures, modeling and simulation . 2020

机译：在嵌入式CPU-GPUS MPSoC上结合任务和数据级并行性，用于高吞吐量CNN推断
5. Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations. [D] . Bobrov, Maksim. 2010

机译：在典型的系统配置中，使用启用了CUDA的GPU进行加密算法加速。
6. Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU [O] . Ahmed Khamis Abdullah Al Ghadani, Waleeja Mateen, Rameshkumar G. Ramaswamy -1

机译：基于张量的基于CUDA优化的嵌入式GPU上的并行加速用于神经网络推理
7. Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU [O] . Ahmed Khamis Abdullah Al Ghadani, Waleeja Mateen, Rameshkumar G. Ramaswamy 2020

机译：基于TensoR的CUDA优化，用于使用平行加速度嵌入式GPU的ANN推断

Tensor-Based CUDA Optimization for ANN Inferencing Using Parallel Acceleration on Embedded GPU

摘要

著录项

相似文献

相关主题

期刊订阅