首页> 外文期刊>IEEE computer architecture letters >Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications
【24h】

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications

机译:类固醇的大页面:加速大内存应用程序的小创意

获取原文
获取原文并翻译 | 示例
       

摘要

Utilizing small (e.g., 4 KB) pages incurs frequent TLB misses on modern big memory applications, substantially degrading the performance of the system. Large (e.g., 1 GB) pages or direct segments can alleviate this penalty due to page table walks, but at the same time such a strategy exposes the organizational and operational details of modern DRAM-based memory systems to applications. Row-buffer conflicts caused by accesses heading to the same DRAM bank but different rows from multiple threads are regarded as the main culprits behind the very large gaps between peak and achieved main memory throughput, but hardware-based approaches in memory controllers have achieved only limited success whereas existing proposals that change memory allocators cannot be applied to large pages or direct segments. In this paper, we propose a set of application-level techniques to improve the effective main memory bandwidth. The techniques stem from the two key observations that 1) each thread of an application exclusively accesses certain datasets for a short or long period of time, and 2) superfluous memory reads originating from a cache's write allocation policy can be avoided if scatters during the data shuffling pass through intermediate cache-friendly buffers. Experiments with a contemporary x86 server show that combining large pages with the proposed address linearization, bank coloring, and write streaming techniques improves the performance of the three big memory applications of high-throughput key-value store, fast-Fourier transform, and radix sort by 37.6, 22.9, and 68.1 percent, respectively.
机译:使用小页(例如4 KB)的页面会导致现代大内存应用程序频繁发生TLB丢失,从而大大降低了系统的性能。大页(例如1 GB)的页面或直接段可以减轻由于页表遍历而带来的损失,但是与此同时,这种策略也将现代基于DRAM的存储系统的组织和操作细节暴露给应用程序。由访问相同的DRAM存储区但来自多个线程的不同行导致的行缓冲区冲突被认为是导致峰值和已实现主内存吞吐量之间巨大差距的主要根源,但是内存控制器中基于硬件的方法仅取得了有限的成就成功,而现有的更改内存分配器的建议无法应用于大页面或直接段。在本文中,我们提出了一组应用程序级技术来提高有效的主内存带宽。该技术源自以下两个主要观察结果:1)应用程序的每个线程在短期或长时间内独占访问某些数据集,以及2)如果在数据期间分散,则可以避免源自缓存的写分配策略的多余内存读取改组通过中间缓存友好的缓冲区。使用现代x86服务器进行的实验表明,将大页面与建议的地址线性化,存储区着色和写入流技术相结合,可以提高高吞吐量键值存储,快速傅立叶变换和基数排序这三种大内存应用程序的性能。分别增长37.6、22.9和68.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号