Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration

机译：通过透明的部分页面迁移实现CPU和GPU之间的高效数据通信

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite the increasing investment in integrated GPUs and next-generation interconnect research, discrete GPUs connected by PCI Express still account for the dominant position of the market, the management of data communication between CPU and GPU continues to evolve. Initially, the programmer controls the data transfer between CPU and GPU explicitly. To simplify programming and enable system-wide atomic memory operations, GPU vendors have developed a programming model that provides a single virtual address space. The page migration engine in this model migrates pages between CPU and GPU on demand automatically. To meet the needs of high-performance workloads, the page size tends to be larger. Limited by low bandwidth and high latency interconnects, larger page migration has longer delay, which may reduce the overlap of computation and transmission and cause serious performance decline. In this paper, we propose partial-page migration that only migrates the requested part of a page to shorten the migration latency and avoid the performance degradation of the whole-page migration when the page becomes larger. Experiments show that partial-page migration is possible to significantly hide the performance overheads of whole-page migration when the page size is 2MB and the PCI Express bandwidth is 16GB/sec, converting an average 72.72× slowdown to a 1.29× speedup when compared with programmers controlled data transmission. Additionally, we examine the impact of page size on TLB miss rate and the performance impact of migration unit size on execution time, enabling designers to make informed decisions.

机译：尽管在集成GPU和下一代互连研究方面的投资不断增加，但通过PCI Express连接的离散GPU仍占据着市场的主导地位，CPU和GPU之间的数据通信管理仍在不断发展。最初，程序员明确控制CPU和GPU之间的数据传输。为了简化编程并实现系统范围的原子内存操作，GPU供应商开发了一种编程模型，该模型提供了单个虚拟地址空间。此模型中的页面迁移引擎会根据需要自动在CPU和GPU之间迁移页面。为了满足高性能工作负载的需求，页面大小趋于更大。受低带宽和高延迟互连的限制，较大的页面迁移将具有较长的延迟，这可能会减少计算和传输的重叠并导致严重的性能下降。在本文中，我们提出了部分页面迁移，该部分页面迁移仅迁移页面的请求部分，以缩短迁移延迟并避免当页面变大时整个页面迁移的性能下降。实验表明，当页面大小为2MB且PCI Express带宽为16GB / sec时，部分页面迁移可能会显着隐藏整个页面迁移的性能开销，与之相比，平均速度降低了72.72倍，从而提高了1.29倍。程序员控制数据传输。此外，我们研究了页面大小对TLB丢失率的影响以及迁移单元大小对执行时间的性能影响，从而使设计人员能够做出明智的决策。

著录项

来源
《IEEE International Conference on High Performance Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems》|2018年|618-625|共8页
会议地点
作者
Shiqing Zhang; Yaohua Yang; Li Shen; Zhiying Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Data communication; Memory management; Bandwidth; Programming; Delays; Central Processing Unit;

机译：图形处理单元;数据通信;内存管理;带宽;编程;延迟;中央处理单元;

相似文献

外文文献
中文文献
专利

1. Transparent partial page migration between CPU and GPU [J] . Shiqing ZHANG, Zheng QIN, Yaohua YANG, Frontiers of computer science in China . 2020,第3期

机译：在CPU和GPU之间进行透明的部分页面迁移
2. GPUs for statistical data analysis in HEP: a performance study of GooFit on GPUs vs. RooFit on CPUs [J] . Alexis Pompili, Adriano Di Florio, CMS Collaboration). Journal of Physics: Conference Series . 2016,第1期

机译：用于HEP中的统计数据分析的GPU：GPU上GooFit与CPU上RooFit的性能研究
3. Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation [J] . Markus Holzer, Martin Bauer, Harald Köstler, International Journal of High Performance Computing Applications . 2021,第4期

机译：通过代码生成，在CPU和GPU上的高密度比下不混溶的液体模拟的高效晶格Boltzmann多相模拟
4. Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration [C] . Shiqing Zhang, Yaohua Yang, Li Shen, IEEE International Conference on High Performance Computing and Communications . 2018

机译：通过透明的部分页面迁移有效CPU和GPU之间的高效数据通信
5. Efficient Precise Dynamic Data Race Detection for CPU and GPU [D] . Peng, Yuanfeng. 2019

机译：CPU和GPU的高效精确动态数据竞争检测
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation [O] . Markus Holzer, Martin Bauer, Harald Köstler, 2021

机译：在CPU和GPU上通过代码生成高效的晶格Boltzmann多相模拟不混溶流体

Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration

摘要

著录项

相似文献

相关主题

期刊订阅