首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring
【24h】

TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring

机译:Tintmalloc:通过控制器感知着色减少内存访问发散

获取原文

摘要

DRAM memory of modern multicores is partitioned into sets, each with its own memory controller governing multiple banks. Accesses can be served in parallel to controllers and banks, but sharing of either between threads results in contention that increases latency, and so do accesses to remote controllers due to the non-uniform memory access (NUMA) design. Above DRAM, a last-level cache (LLC), typically at level 3 (L3), is shared by all cores while L1 and L2 caches tend to be core private. This NUMA design inflicts significant variations in execution time for applications with large datasets due to different latencies incurred by remote memory node accesses or contention in LLC and at memory banks/controllers. As a result, single program multiple data (SPMD) applications tend to experience computational imbalance at barriers, which inflicts idle (wait) time for threads that at barriers arrive early and thus impairs effective processor utilization and ultimately performance. This work contributes a novel memory allocator called Tint-Malloc that colors memory at the LLC, bank, and controller level to ensure locality to the local memory node while reducing contention at the LLC/bank levels in software. After adding one line of code during initialization in each thread, existing applications automatically obtain colored heap space through regular malloc calls. Experimental results with the SPEC and Parsec benchmarks show that by choosing disjoint colors per thread, locality is increased, contention is decreased, and overall SPMD execution becomes more balanced atbarriers than default memory allocation under Linux as well as prior coloring approaches.
机译:现代多设备的DRAM内存被分区成集合,每个内存控制多个银行的内存控制器。访问可以与控制器和银行并行服务,但在线程之间共享导致增加延迟的争用,因此由于非均匀内存访问(NUMA)设计而访问远程控制器。在DRAM之上,所有内核都共享最后一个级别的缓存(LLC),通常在3级(L3),而L1和L2缓存往往是核心私有的。由于LLC中的远程存储器节点访问或争用,在LLC和存储体/控制器处的争用,因此,这种NUMA设计造成了具有大型数据集的应用程序的执行时间的显着变化。因此,单程多个数据(SPMD)应用程序倾向于体验障碍的计算不平衡,这造成了在障碍提前到达的线程的空闲(等待)时间,从而削弱有效的处理器利用和最终性能。这项工作贡献了一个名为TINT-Malloc的新型内存分配器,即LLC,BANK和控制器级别的颜色存储器,以确保到本地存储节点的局部性,同时在软件中减少LLC /银行级别的争用。在每个线程中的初始化期间添加一行代码后,现有应用程序通过常规Malloc调用自动获得彩色堆空间。使用规范和PARSEC基准的实验结果表明,通过每个线程选择不相交的颜色,可能会增加争用,总体SPMD执行比Linux下的默认内存分配更加平衡的atbarriers,以及先前着色方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号