Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform

Wang Zhendong; Jiang Zihang; Wang Zhen; Tang Xulong; Liu Cong; Yin Shouyi; Hu Yang

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform

【24h】

Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform

机译：实现集成CPU / GPU异构平台的延迟感知数据初始化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Nowadays, driven by the needs of autonomous driving and edge intelligence, integrated CPU/GPU heterogeneous platform has gained significant attention from both academia and industry. As the representative series, NVIDIA Jetson family perform well in terms of computation capability, power consumption, and mobile size. Even so, the integrated heterogeneous platform only contains one limited physical memory, which is shared by the CPU and GPU cores and can be the performance bottleneck of the mobile/edge applications. On the other hand, with the unified memory (UM) model introduced in GPU programming, not only the memory allocation is significantly reduced, which mitigates the memory bottleneck of the integrated platforms but also the memory management and programming are simplified. However, as a programming legacy, the UM model still follows the conventional copy-then-execute model, initializing data on the CPU side after allocating memory. This legacy programming mode not only causes significant initialization latency but also slows the execution of the following kernel. In this article, we propose a framework to enable the latency-aware data initialization on the integrated heterogeneous platform. The framework not only includes three data initialization modes, the CPU initialization, GPU initialization, and hybrid initialization, but also utilizes an affinity estimation model to wisely decide the best initialization mode for an application such that the initialization latency performance of the application can be optimized. We evaluate our design on NVIDIA TX2 and AGX platforms. The results demonstrate that the framework can accurately select a data initialization mode for a given application to significantly reduce the initialization latency. We envision this latency-aware data initialization framework being adopted in a full-version of autonomous solution (e.g., Autoware) in the future.

机译：如今，受自动驾驶和边缘智能的需求驱动，集成的CPU / GPU异构平台从学术界和工业中获得了重大关注。作为代表系列，NVIDIA Jetson家族在计算能力，功耗和移动尺寸方面表现良好。即便如此，集成的异构平台只包含一个有限的物理内存，由CPU和GPU核心共享，并且可以是移动/边缘应用程序的性能瓶颈。另一方面，在GPU编程中引入的统一存储器（UM）模型，不仅可以显着减少内存分配，这减轻了集成平台的存储器瓶颈，而且简化了存储器管理和编程。然而，作为编程遗留，UM模型仍然遵循传统的拷贝 - 然后执行模型，在分配内存之后初始化CPU侧的数据。此传统编程模式不仅导致显着的初始化延迟，而且减慢了以下内核的执行。在本文中，我们提出了一个框架，以使集成异构平台上的延迟感知数据初始化。该框架不仅包括三种数据初始化模式，CPU初始化，GPU初始化和混合初始化，还利用了亲和估计模型来明智地确定应用程序的最佳初始化模式，使得应用程序的初始化延迟性能可以优化。我们评估我们在NVIDIA TX2和AGX平台上的设计。结果表明，该框架可以为给定应用程序准确选择数据初始化模式，以显着降低初始化延迟。我们设想在未来的全版自动解决方案（例如，Autoware）中采用此延迟感知数据初始化框架。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2020年第11期|3433-3444|共12页
作者
Wang Zhendong; Jiang Zihang; Wang Zhen; Tang Xulong; Liu Cong; Yin Shouyi; Hu Yang;
展开▼
作者单位

Univ Texas Dallas Richardson TX 75080 USA;

Tsinghua Univ Beijing 100084 Peoples R China;

Univ Texas Dallas Richardson TX 75080 USA;

Univ Pittsburgh Pittsburgh PA 15213 USA;

Univ Texas Dallas Richardson TX 75080 USA;

Tsinghua Univ Beijing 100084 Peoples R China;

Univ Texas Dallas Richardson TX 75080 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Affinity estimation; data initialization; heterogeneous platforms; integrated GPU (iGPU); latency; unified memory (UM);

机译：亲和力估计;数据初始化;异构平台;集成GPU（IGPU）;延迟;统一内存（嗯）;

相似文献

外文文献
中文文献
专利

1. Performance Evaluation and Analysis for Conjugate Gradient Solver on Heterogeneous (Multi-GPUs/Multi-CPUs) platforms [J] . Najlae Kasmi, Mostapha Zbakh, Sidi Ahmed Mahmoudi, International journal of computer science and network security . 2017,第8期

机译：异构（多GPU /多CPU）平台上共轭梯度求解器的性能评估和分析
2. An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs [J] . Youngsub Ko, Youngmin Yi, Soonhoi Ha Journal of Real-Time Image Processing . 2014,第1期

机译：在由CPU和GPU组成的异构平台上针对x264编码器的高效并行化技术
3. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms [J] . Waqar Ali, Heechul Yun LIPIcs : Leibniz International Proceedings in Informatics . 2018,第30期

机译：保护集成CPU-GPU SOC平台实时GPU内核
4. Accelerating hyper-spectral data processing on the multi-CPU and multi-GPU heterogeneous computing platform [C] . Lei. Zhang, Jiao. Bo. Gao, Yu. Hu, International Conference on Photonics and Optical Engineering . 2017

机译：加速多CPU和多GPU异构计算平台的超光谱数据处理
5. Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms. [D] . Wu, Jing. 2014

机译：用于将算法和应用程序映射到CUDA GPU平台和CPU-GPU异构平台的优化技术。
6. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms [O] . George Teodoro, Tony Pan, Tahsin M. Kurc, -1

机译：CPU-GPU集群平台上的大型显微镜图像数据集的高通量分析
7. Distributed shared memory on heterogeneous CPUs+GPUs platforms [O] . Alves Ricardo 2012

机译：异构CPU + GPU平台上的分布式共享内存

Enabling Latency-Aware Data Initialization for Integrated CPU/GPU Heterogeneous Platform

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅