Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors

Rachata Ausavarungnirun; Joshua Landgraf; Vance Miller; Saugata Ghose; Jayneel Gandhi; Christopher J. Rossbach; Onur Mutlu

首页> 外文期刊>Operating systems review >Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors

【24h】

Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors

机译：Mosaic：在吞吐量处理器中为多个页面大小启用应用程序透明支持

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Contemporary discrete GPUs support rich memory management features such as virtual memory and demand paging. These features simplify GPU programming by providing a virtual address space abstraction similar to CPUs and eliminating manual memory management, but they introduce high performance overheads during (1) address translation and (2) page faults. A GPU relies on high degrees of thread-level parallelism (TLP) to hide memory latency. Address translation can undermine TLP, as a single miss in the translation lookaside buffer (TLB) invokes an expensive serialized page table walk that often stalls multiple threads. Demand paging can also undermine TLP, as multiple threads often stall while they wait for an expensive data transfer over the system I/O (e.g., PCIe) bus when the GPU demands a page. In modern GPUs, we face a trade-off on how the page size used for memory management affects address translation and demand paging. The address translation overhead is lower when we employ a larger page size (e.g., 2MB large pages, compared with conventional 4KB base pages), which increases TLB coverage and thus reduces TLB misses. Conversely, the demand paging overhead is lower when we employ a smaller page size, which decreases the system I/O bus transfer latency. Support for multiple page sizes can help relax the page size trade-off so that address translation and demand paging optimizations work together synergistically. However, existing page coalescing (i.e., merging base pages into a large page) and splintering (i.e., splitting a large page into base pages) policies require costly base page migrations that undermine the benefits multiple page sizes provide. In this paper, we observe that GPGPU applications present an opportunity to support multiple page sizes without costly data migration, as the applications perform most of their memory allocation en masse (i.e., they allocate a large number of base pages at once). We show that this en masse allocation allows us to create intelligent memory allocation policies which ensure that base pages that are contiguous in virtual memory are allocated to contiguous physical memory pages. As a result, coalescing and splintering operations no longer need to migrate base pages. We introduce Mosaic, a GPU memory manager that provides application-transparent support for multiple page sizes. Mosaic uses base pages to transfer data over the system I/O bus, and allocates physical memory in a way that (1) preserves base page contiguity and (2) ensures that a large page frame contains pages from only a single memory protection domain. We take advantage of this allocation strategy to design a novel in-place page size selection mechanism that avoids data migration. This mechanism allows the TLB to use large pages, reducing address translation overhead. During data transfer, this mechanism enables the GPU to transfer only the base pages that are needed by the application over the system I/O bus, keeping demand paging overhead low. Our evaluations show that Mosaic reduces address translation overheads while efficiently achieving the benefits of demand paging, compared to a contemporary GPU that uses only a 4KB page size. Relative to a state-of-the-art GPU memory manager, Mosaic improves the performance of homogeneous and heterogeneous multi-application workloads by 55.5% and 29.7% on average, respectively, coming within 6.8% and 15.4% of the performance of an ideal TLB where all TLB requests are hits.

机译：当代的离散GPU支持丰富的内存管理功能，例如虚拟内存和需求分页。这些功能通过提供类似于CPU的虚拟地址空间抽象并消除了手动内存管理，简化了GPU编程，但在（1）地址转换和（2）页面错误期间引入了高性能开销。 GPU依靠高度的线程级并行性（TLP）来隐藏内存延迟。地址转换会破坏TLP，因为转换后备缓冲区（TLB）中的单个未命中会调用昂贵的序列化页表遍历，该遍历经常使多个线程停顿。需求分页也会破坏TLP，因为当GPU要求页面时，多个线程通常在等待通过系统I / O（例如PCIe）总线进行昂贵的数据传输时经常停顿。在现代GPU中，我们面临着用于内存管理的页面大小如何影响地址转换和需求分页的折衷方案。当我们使用较大的页面大小（例如，与传统的4KB基本页面相比为2MB大页面）时，地址转换开销较低，这会增加TLB覆盖范围，从而减少TLB丢失。相反，当我们使用较小的页面大小时，需求分页开销较低，这减少了系统I / O总线传输延迟。对多种页面尺寸的支持可以帮助放松页面尺寸之间的权衡，以便地址转换和需求分页优化协同工作。然而，现有的页面合并（即，将基本页面合并成一个大页面）和分裂（即，将一个大页面拆分成多个基本页面）策略需要昂贵的基本页面迁移，这破坏了多个页面大小提供的好处。在本文中，我们观察到GPGPU应用程序提供了一个机会来支持多种页面大小而无需进行昂贵的数据迁移，因为这些应用程序会大量执行其大部分内存分配（即它们一次分配大量基本页面）。我们表明，这种整体分配使我们能够创建智能内存分配策略，以确保将虚拟内存中连续的基本页面分配给连续的物理内存页面。因此，合并和拆分操作不再需要迁移基础页面。我们介绍了Mosaic，这是一种GPU内存管理器，可为多种页面大小提供应用程序透明的支持。 Mosaic使用基本页面通过系统I / O总线传输数据，并以（1）保留基本页面连续性和（2）确保大页面框架仅包含来自单个内存保护域的页面的方式分配物理内存。我们利用这种分配策略来设计一种新颖的就地页面大小选择机制，以避免数据迁移。此机制允许TLB使用大页面，从而减少地址转换开销。在数据传输期间，此机制使GPU可以仅通过系统I / O总线传输应用程序所需的基本页面，从而使需求分页开销保持较低。我们的评估表明，与仅使用4KB页面大小的现代GPU相比，Mosaic减少了地址转换开销，同时有效地实现了需求分页的好处。相对于最先进的GPU内存管理器，Mosaic分别将同质和异构多应用程序工作负载的性能分别提高了55.5％和29.7％，分别是理想性能的6.8％和15.4％ TLB，其中所有TLB请求都是命中的。

著录项

来源
《Operating systems review 》 |2018年第1期| 27-44| 共18页
作者
Rachata Ausavarungnirun; Joshua Landgraf; Vance Miller; Saugata Ghose; Jayneel Gandhi; Christopher J. Rossbach; Onur Mutlu;
展开▼
作者单位

Carnegie Mellon University,King Mongkut University of Technology North Bangkok;

University of Texas at Austin;

University of Texas at Austin;

Carnegie Mellon University;

VMware Research;

University of Texas at Austin,VMware Research;

Carnegie Mellon University,ETH Zuerich;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
graphics processing units; GPGPU applications; address translation; demand paging; large pages; virtual memory management;

机译：图形处理单元;GPGPU应用程序;地址翻译;需求分页;大页;虚拟内存管理;

相似文献

外文文献
中文文献
专利

1. An automated programmable platform enabling multiplex dynamic stimuli delivery and cellular response monitoring for high-throughput suspension single-cell signaling studies [J] . He Luye, Kniss Ariel, San-Miguel Adriana, Lab on a chip . 2015 ,第6期

机译：一个自动化的可编程平台，可为高通量悬浮单细胞信号研究提供多重动态刺激传递和细胞反应监测
2. Enabling High-Throughput Discovery of the RNA Transcription Landscape Using a Directional RNA Workflow and a Combinatorial Multiplexing Approach [J] . Daniela B. Munafo, Pingfang Liu, Christine J. Sumner, Journal of biomolecular techniques :JBT. . 2014 ,第Suppl期

机译：使用定向RNA工作流和组合多路复用方法启用RNA转录景观的高通量发现
3. NO oxidation over supported Pt: Impact of precursor, support, loading, and processing conditions evaluated via high throughput experimentation [J] . Schmitz PJ, Kudla RJ, Drews AR, Applied Catalysis, B. Environmental: An International Journal Devoted to Catalytic Science and Its Applications . 2006 ,第3a4期

机译：负载Pt上的NO氧化：通过高通量实验评估前体，载体，负载和加工条件的影响
4. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes [C] . Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Annual IEEE/ACM International Symposium on Microarchitecture . 2017

机译：马赛克：GPU内存管理器，具有对多个页面大小的应用程序透明支持
5. Enabling high throughput and secure underwater wireless networks through advanced signal processing techniques [D] . Kulhandjian, Hovannes 2014

机译：通过先进的信号处理技术实现高吞吐量和安全的水下无线网络
6. Nano-plasmonics and electronics co-integration in CMOS enabling a pill-sized multiplexed fluorescence microarray system [O] . Lingyu Hong, Hao Li, Haw Yang, 2018

机译：CMOS中的纳米等离子和电子共集成可实现丸状的多重荧光微阵列系统
7. Enhanced information throughput in multiple input\ud multiple output orthogonal frequency division\ud multiplexing based systems using fractional sampling\ud and iterative signal processing [O] . Sharma, Shree, Patwary, Mohammad, Abdel-Maguid, Mohamed 2014

机译：多输入\ ud中增强的信息吞吐量多路输出正交分频\ ud 使用分数采样的基于多路复用的系统\ ud 和迭代信号处理

Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors

摘要

著录项

相似文献

相关主题

期刊订阅