Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications

Daejin Jung; Sheng Li; Jung Ho Ahn

首页> 外文期刊>IEEE computer architecture letters >Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications

【24h】

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications

机译：类固醇的大页面：加速大内存应用程序的小创意

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Utilizing small (e.g., 4 KB) pages incurs frequent TLB misses on modern big memory applications, substantially degrading the performance of the system. Large (e.g., 1 GB) pages or direct segments can alleviate this penalty due to page table walks, but at the same time such a strategy exposes the organizational and operational details of modern DRAM-based memory systems to applications. Row-buffer conflicts caused by accesses heading to the same DRAM bank but different rows from multiple threads are regarded as the main culprits behind the very large gaps between peak and achieved main memory throughput, but hardware-based approaches in memory controllers have achieved only limited success whereas existing proposals that change memory allocators cannot be applied to large pages or direct segments. In this paper, we propose a set of application-level techniques to improve the effective main memory bandwidth. The techniques stem from the two key observations that 1) each thread of an application exclusively accesses certain datasets for a short or long period of time, and 2) superfluous memory reads originating from a cache's write allocation policy can be avoided if scatters during the data shuffling pass through intermediate cache-friendly buffers. Experiments with a contemporary x86 server show that combining large pages with the proposed address linearization, bank coloring, and write streaming techniques improves the performance of the three big memory applications of high-throughput key-value store, fast-Fourier transform, and radix sort by 37.6, 22.9, and 68.1 percent, respectively.

机译：使用小页（例如4 KB）的页面会导致现代大内存应用程序频繁发生TLB丢失，从而大大降低了系统的性能。大页（例如1 GB）的页面或直接段可以减轻由于页表遍历而带来的损失，但是与此同时，这种策略也将现代基于DRAM的存储系统的组织和操作细节暴露给应用程序。由访问相同的DRAM存储区但来自多个线程的不同行导致的行缓冲区冲突被认为是导致峰值和已实现主内存吞吐量之间巨大差距的主要根源，但是内存控制器中基于硬件的方法仅取得了有限的成就成功，而现有的更改内存分配器的建议无法应用于大页面或直接段。在本文中，我们提出了一组应用程序级技术来提高有效的主内存带宽。该技术源自以下两个主要观察结果：1）应用程序的每个线程在短期或长时间内独占访问某些数据集，以及2）如果在数据期间分散，则可以避免源自缓存的写分配策略的多余内存读取改组通过中间缓存友好的缓冲区。使用现代x86服务器进行的实验表明，将大页面与建议的地址线性化，存储区着色和写入流技术相结合，可以提高高吞吐量键值存储，快速傅立叶变换和基数排序这三种大内存应用程序的性能。分别增长37.6、22.9和68.1％。

著录项

来源
《IEEE computer architecture letters》 |2016年第2期|101-104|共4页
作者
Daejin Jung; Sheng Li; Jung Ho Ahn;
展开▼
作者单位

Department of Transdisciplinary Studies, Seoul National University, Seoul, Republic of Korea;

Intel, Labs, Santa Clara, CA;

Department of Transdisciplinary Studies, Seoul National University, Seoul, Republic of Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Instruction sets; Random access memory; Servers; DRAM chips; Performance gain; Bandwidth; Memory management;

机译：指令集;随机存取存储器;服务器;DRAM芯片;性能增益;带宽;内存管理;

相似文献

外文文献
中文文献
专利

1. Ultrahigh-performance liquid chromatography-ion trap mass spectrometry characterization of the steroidal saponins of Dioscorea panthaica Prain et Burkill and its application for accelerating the isolation and structural elucidation of steroidal saponins [J] . Wang Weihao, Zhao Ye, Jing Wenguang, Steroids: An International Journal . 2015,第Null期

机译：薯Di类固醇皂苷的超高效液相色谱-离子阱质谱表征及其在加速甾体皂苷的分离和结构解析中的应用
2. Applications of shape memory alloys in civil engineering structures - Overview, limits and new ideas [J] . L. Janke, C. Czaderski, M. Motavalli, Materials and structures . 2005,第279期

机译：形状记忆合金在土木工程结构中的应用-概述，局限性和新思路
3. Accelerating computation of Euclidean distance map using the GPU with efficient memory access [J] . Duhu Man, Kenji Uda, Yasuaki Ito, Parallel Algorithms and Applications . 2013,第5期

机译：使用具有高效内存访问功能的GPU加速欧几里得距离图的计算
4. Systematic design of an ideal toolflow for accelerating big data applications on FPGA platforms [C] . Khobatha Setetemela, Simon Winberg 2018 IEEE 9th International Conference on Mechanical and Intelligent Manufacturing Technologies . 2018

机译：用于加速FPGA平台上的大数据应用程序的理想工具流程的系统设计
5. Development and analysis of weak memory consistency models to accelerate shared-memory multiprocessor systems [D] . Yoon, Myungchul 1998

机译：开发和分析弱内存一致性模型以加速共享内存多处理器系统
6. Does suicidal ideation influence memory? A study of the role of violent daydreaming in the relationship between suicidal ideation and everyday memory [O] . Carol Chu, Matthew C. Podlogar, Megan L. Rogers, -1

机译：自杀意念会影响记忆吗？暴力白日梦在自杀意念与日常记忆之间的关系研究
7. On Possibilities to Accelerate Engineering Application of Shape Memory Alloys [O] . Ionaitis, R., Chernov, D. 1997

机译：加快形状记忆合金工程应用的可能性

Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications

摘要

著录项

相似文献

相关主题

期刊订阅