Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs

机译：在NVIDIA Volta GPU上进行预取和不预取的情况下比较托管内存和ATS

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the major differences in many-core versus multicore architectures is the presence of two different memory spaces: a host space and a device space. In the case of NVIDIA GPUs, the device is supplied with data from the host via one of the multiple memory management API calls provided by the CUDA framework, such as CudaMallocManaged and CudaMemCpy. Modern systems, such as the Summit supercomputer, have the capability to avoid the use of CUDA calls for memory management and access the same data on GPU and CPU. This is done via the Address Translation Services (ATS) technology that gives a unified virtual address space for data allocated with malloc and new if there is an NVLink connection between the two memory spaces. In this paper, we perform a deep analysis of the performance achieved when using two types of unified virtual memory addressing: UVM and managed memory.

机译：多核与多核体系结构的主要区别之一是存在两个不同的内存空间：主机空间和设备空间。对于NVIDIA GPU，通过CUDA框架提供的多个内存管理API调用之一（例如CudaMallocManaged和CudaMemCpy）从主机为设备提供数据。诸如Summit超级计算机之类的现代系统具有避免使用CUDA调用进行内存管理并在GPU和CPU上访问相同数据的能力。这是通过地址转换服务（ATS）技术完成的，该技术为两个内存空间之间存在NVLink连接的情况下，为使用malloc和new分配的数据提供了统一的虚拟地址空间。在本文中，我们对使用两种类型的统一虚拟内存寻址（UVM和托管内存）时获得的性能进行了深入的分析。

著录项

来源
《Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems;International Conference for High Performance Computing, Networking, Storage and Analysis》|2019年|41-46|共6页
会议地点
作者
Rahulkumar Gayatri; Kevin Gott; Jack Deslippe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Kernel; Benchmark testing; Memory management; Prefetching; Resource management;

机译：图形处理单元;内核;基准测试;内存管理;预取;资源管理;

相似文献

外文文献
中文文献
专利

1. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs [J] . Knap Marcin, Czarnul Pawel Journal of supercomputing . 2019,第11期

机译：对NVIDIA Pascal和Volta GPU上的选定并行CUDA应用程序进行预取和超额预订的统一内存的性能评估
2. Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs [J] . Mark J. Mawson, Alistair J. Revell Computer physics communications . 2014,第10期

机译：开普勒架构nVidia GPU上晶格Boltzmann求解器的内存传输优化
3. Using Data Compression for Increasing Efficiency of Data Transfer Between Main Memory and Intel Xeon Phi Coprocessor or NVidia GPU in Parallel DBMS [J] . Konstantin Y. Besedin, Pavel S. Kostenetskiy, Stepan O. Prikazchikov Procedia Computer Science . 2015,第1期

机译：使用数据压缩来提高并行DBMS中主内存与Intel Xeon Phi协处理器或NVidia GPU之间的数据传输效率
4. Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs [C] . Rahulkumar Gayatri, Kevin Gott, Jack Deslippe Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems . 2019

机译：在NVIDIA Volta GPU上比较托管内存和ATS而不预取
5. GPU acceleration of object classification algorithms using NVIDIA CUDA. [D] . Harvey, Jesse Patrick. 2009

机译：使用NVIDIA CUDA加速对象分类算法的GPU。
6. Smarter Traffic Prediction Using Big Data In-Memory Computing Deep Learning and GPUs [O] . Muhammad Aqib, Rashid Mehmood, Ahmed Alzahrani, 2019

机译：使用大数据内存计算深度学习和GPU进行更智能的流量预测
7. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs [O] . Marcin Knap, Paweł Czarnul 2019

机译：NVIDIA Pascal和Volta GPU中所选并行CUDA应用的预取和超订阅统一内存的绩效评估

Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs

摘要

著录项

相似文献

相关主题

期刊订阅