...
首页> 外文期刊>Operating systems review >Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors
【24h】

Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors

机译:Mosaic:在吞吐量处理器中为多个页面大小启用应用程序透明支持

获取原文
获取原文并翻译 | 示例

摘要

Contemporary discrete GPUs support rich memory management features such as virtual memory and demand paging. These features simplify GPU programming by providing a virtual address space abstraction similar to CPUs and eliminating manual memory management, but they introduce high performance overheads during (1) address translation and (2) page faults. A GPU relies on high degrees of thread-level parallelism (TLP) to hide memory latency. Address translation can undermine TLP, as a single miss in the translation lookaside buffer (TLB) invokes an expensive serialized page table walk that often stalls multiple threads. Demand paging can also undermine TLP, as multiple threads often stall while they wait for an expensive data transfer over the system I/O (e.g., PCIe) bus when the GPU demands a page. In modern GPUs, we face a trade-off on how the page size used for memory management affects address translation and demand paging. The address translation overhead is lower when we employ a larger page size (e.g., 2MB large pages, compared with conventional 4KB base pages), which increases TLB coverage and thus reduces TLB misses. Conversely, the demand paging overhead is lower when we employ a smaller page size, which decreases the system I/O bus transfer latency. Support for multiple page sizes can help relax the page size trade-off so that address translation and demand paging optimizations work together synergistically. However, existing page coalescing (i.e., merging base pages into a large page) and splintering (i.e., splitting a large page into base pages) policies require costly base page migrations that undermine the benefits multiple page sizes provide. In this paper, we observe that GPGPU applications present an opportunity to support multiple page sizes without costly data migration, as the applications perform most of their memory allocation en masse (i.e., they allocate a large number of base pages at once). We show that this en masse allocation allows us to create intelligent memory allocation policies which ensure that base pages that are contiguous in virtual memory are allocated to contiguous physical memory pages. As a result, coalescing and splintering operations no longer need to migrate base pages. We introduce Mosaic, a GPU memory manager that provides application-transparent support for multiple page sizes. Mosaic uses base pages to transfer data over the system I/O bus, and allocates physical memory in a way that (1) preserves base page contiguity and (2) ensures that a large page frame contains pages from only a single memory protection domain. We take advantage of this allocation strategy to design a novel in-place page size selection mechanism that avoids data migration. This mechanism allows the TLB to use large pages, reducing address translation overhead. During data transfer, this mechanism enables the GPU to transfer only the base pages that are needed by the application over the system I/O bus, keeping demand paging overhead low. Our evaluations show that Mosaic reduces address translation overheads while efficiently achieving the benefits of demand paging, compared to a contemporary GPU that uses only a 4KB page size. Relative to a state-of-the-art GPU memory manager, Mosaic improves the performance of homogeneous and heterogeneous multi-application workloads by 55.5% and 29.7% on average, respectively, coming within 6.8% and 15.4% of the performance of an ideal TLB where all TLB requests are hits.
机译:当代的离散GPU支持丰富的内存管理功能,例如虚拟内存和需求分页。这些功能通过提供类似于CPU的虚拟地址空间抽象并消除了手动内存管理,简化了GPU编程,但在(1)地址转换和(2)页面错误期间引入了高性能开销。 GPU依靠高度的线程级并行性(TLP)来隐藏内存延迟。地址转换会破坏TLP,因为转换后备缓冲区(TLB)中的单个未命中会调用昂贵的序列化页表遍历,该遍历经常使多个线程停顿。需求分页也会破坏TLP,因为当GPU要求页面时,多个线程通常在等待通过系统I / O(例如PCIe)总线进行昂贵的数据传输时经常停顿。在现代GPU中,我们面临着用于内存管理的页面大小如何影响地址转换和需求分页的折衷方案。当我们使用较大的页面大小(例如,与传统的4KB基本页面相比为2MB大页面)时,地址转换开销较低,这会增加TLB覆盖范围,从而减少TLB丢失。相反,当我们使用较小的页面大小时,需求分页开销较低,这减少了系统I / O总线传输延迟。对多种页面尺寸的支持可以帮助放松页面尺寸之间的权衡,以便地址转换和需求分页优化协同工作。然而,现有的页面合并(即,将基本页面合并成一个大页面)和分裂(即,将一个大页面拆分成多个基本页面)策略需要昂贵的基本页面迁移,这破坏了多个页面大小提供的好处。在本文中,我们观察到GPGPU应用程序提供了一个机会来支持多种页面大小而无需进行昂贵的数据迁移,因为这些应用程序会大量执行其大部分内存分配(即它们一次分配大量基本页面)。我们表明,这种整体分配使我们能够创建智能内存分配策略,以确保将虚拟内存中连续的基本页面分配给连续的物理内存页面。因此,合并和拆分操作不再需要迁移基础页面。我们介绍了Mosaic,这是一种GPU内存管理器,可为多种页面大小提供应用程序透明的支持。 Mosaic使用基本页面通过系统I / O总线传输数据,并以(1)保留基本页面连续性和(2)确保大页面框架仅包含来自单个内存保护域的页面的方式分配物理内存。我们利用这种分配策略来设计一种新颖的就地页面大小选择机制,以避免数据迁移。此机制允许TLB使用大页面,从而减少地址转换开销。在数据传输期间,此机制使GPU可以仅通过系统I / O总线传输应用程序所需的基本页面,从而使需求分页开销保持较低。我们的评估表明,与仅使用4KB页面大小的现代GPU相比,Mosaic减少了地址转换开销,同时有效地实现了需求分页的好处。相对于最先进的GPU内存管理器,Mosaic分别将同质和异构多应用程序工作负载的性能分别提高了55.5%和29.7%,分别是理想性能的6.8%和15.4% TLB,其中所有TLB请求都是命中的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号