首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform
【24h】

Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform

机译:实现集成CPU / GPU异构平台的延迟感知数据初始化

获取原文
获取原文并翻译 | 示例

摘要

Nowadays, driven by the needs of autonomous driving and edge intelligence, integrated CPU/GPU heterogeneous platform has gained significant attention from both academia and industry. As the representative series, NVIDIA Jetson family perform well in terms of computation capability, power consumption, and mobile size. Even so, the integrated heterogeneous platform only contains one limited physical memory, which is shared by the CPU and GPU cores and can be the performance bottleneck of the mobile/edge applications. On the other hand, with the unified memory (UM) model introduced in GPU programming, not only the memory allocation is significantly reduced, which mitigates the memory bottleneck of the integrated platforms but also the memory management and programming are simplified. However, as a programming legacy, the UM model still follows the conventional copy-then-execute model, initializing data on the CPU side after allocating memory. This legacy programming mode not only causes significant initialization latency but also slows the execution of the following kernel. In this article, we propose a framework to enable the latency-aware data initialization on the integrated heterogeneous platform. The framework not only includes three data initialization modes, the CPU initialization, GPU initialization, and hybrid initialization, but also utilizes an affinity estimation model to wisely decide the best initialization mode for an application such that the initialization latency performance of the application can be optimized. We evaluate our design on NVIDIA TX2 and AGX platforms. The results demonstrate that the framework can accurately select a data initialization mode for a given application to significantly reduce the initialization latency. We envision this latency-aware data initialization framework being adopted in a full-version of autonomous solution (e.g., Autoware) in the future.
机译:如今,受自动驾驶和边缘智能的需求驱动,集成的CPU / GPU异构平台从学术界和工业中获得了重大关注。作为代表系列,NVIDIA Jetson家族在计算能力,功耗和移动尺寸方面表现良好。即便如此,集成的异构平台只包含一个有限的物理内存,由CPU和GPU核心共享,并且可以是移动/边缘应用程序的性能瓶颈。另一方面,在GPU编程中引入的统一存储器(UM)模型,不仅可以显着减少内存分配,这减轻了集成平台的存储器瓶颈,而且简化了存储器管理和编程。然而,作为编程遗留,UM模型仍然遵循传统的拷贝 - 然后执行模型,在分配内存之后初始化CPU侧的数据。此传统编程模式不仅导致显着的初始化延迟,而且减慢了以下内核的执行。在本文中,我们提出了一个框架,以使集成异构平台上的延迟感知数据初始化。该框架不仅包括三种数据初始化模式,CPU初始化,GPU初始化和混合初始化,还利用了亲和估计模型来明智地确定应用程序的最佳初始化模式,使得应用程序的初始化延迟性能可以优化。我们评估我们在NVIDIA TX2和AGX平台上的设计。结果表明,该框架可以为给定应用程序准确选择数据初始化模式,以显着降低初始化延迟。我们设想在未来的全版自动解决方案(例如,Autoware)中采用此延迟感知数据初始化框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号