Beyond the Socket: NUMA-Aware GPUs

机译：超越套接字：numa感知gpus

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

GPUs achieve high throughput and power efficiency by employing many small single instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism exposed via the programming model. With Moore's law slowing, for GPUs to continue scaling performance (which largely depends on SIMT core count) they are likely to embrace multi-socket designs where transistors are more readily available. However when moving to such designs, maintaining the illusion of a uniform memory system is increasingly difficult. In this work we investigate multi-socket non-uniform memory access (NUMA) GPU designs and show that significant changes are needed to both the GPU interconnect and cache architectures to achieve performance scalability. We show that application phase effects can be exploited allowing GPU sockets to dynamically optimize their individual interconnect and cache policies, minimizing the impact of NUMA effects. Our NUMA-aware GPU outperforms a single GPU by 1.5×, 2.3×, and 3.2× while achieving 89%, 84%, and 76% of theoretical application scalability in 2, 4, and 8 sockets designs respectively. Implementable today, NUMA-aware multi-socket GPUs may be a promising candidate for scaling GPU performance beyond a single socket.

机译：GPU通过采用许多小单指令多线（SIMT）核来实现高吞吐量和功率效率。为了最小化调度逻辑和性能方差，它们利用统一的存储器系统并利用通过编程模型暴露的强大数据并行性。随着摩尔的法律放缓，对于GPU来继续缩放性能（这主要取决于SIMT核心计数），它们可能会拥抱多插槽设计，其中晶体管更容易获得。然而，当移动到这种设计时，保持均匀存储器系统的错觉越来越困难。在这项工作中，我们调查多套接字非统一内存访问（NUMA）GPU设计，并显示GPU互连和缓存架构需要显着的更改以实现性能可扩展性。我们表明可以利用应用程序阶段效果，允许GPU套接字动态优化其各个互连和高速缓存策略，从而最大限度地减少NUMA效果的影响。我们的NUMA感知GPU分别优于1.5倍，2.3×和3.2倍的单个GPU，同时分别在2,4和8个插座设计中实现了89％，84％和76％的理论应用可扩展性。今天可实现，Numa感知多套接GPU可能是用于缩放GPU性能超出单个套接字的有希望的候选者。

著录项

来源
《International Symposium on Microarchitecture》|2017年|xix 825 p. :|共13页
会议地点
作者
Ugljesa Milic; Oreste Villa; Evgeny Bolotin; Akhil Arunkumar; Eiman Ebrahimi; Aamer Jaleel; Alex Ramirez; David Nellans;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
cache storage; graphics processing units; integrated circuit design; multiprocessing systems; multi-threading; power aware computing;

机译：缓存存储;图形处理单元;集成电路设计;多加工系统;多线程;动力感知计算;

相似文献

外文文献
中文文献
专利

1. NUMA-aware image compositing on multi-GPU platform [J] . Pan Wang, Zhiquan Cheng, Ralph Martin, The Visual Computer . 2013,第6a8期

机译：在多GPU平台上可识别NUMA的图像
2. Strength evaluation of prosthetic check sockets, copolymer sockets, and definitive laminated sockets [J] . Maria J. Gerschutz PhD, Michael L. Haynes MS, Derek Nixon BS, Journal of Rehabilitation Research and Development . 2012,第3期

机译：假体检查窝，共聚物窝和定型叠层窝的强度评估
3. Soft-tissue esthetic outcome of single implants: Immediate placement in fresh extraction sockets versus conventional placement in healed sockets [J] . Nima Naddaf Pour, Baharak Ghaedi, Mona Sohrabi Journal of Indian Society of Periodontology . 2018,第3期

机译：单个植入物的软组织美学效果：立即放置在新鲜拔牙窝中，而不是常规放置在愈合窝中
4. Beyond the Socket: NUMA-Aware GPUs [C] . Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Annual IEEE/ACM International Symposium on Microarchitecture . 2017

机译：套接字之外：NUMA感知GPU
5. Finite Element Modeling of a Model-scale, Rock-socketed Pile under Cyclic Lateral Loading [D] . Bajaj, Aditya Sunil. 2018

机译：循环横向载荷下模型尺度岩石插座桩的有限元建模
6. Immediate Implant Placement in Non-Infected Sockets versus Infected Sockets: a Systematic Review and Meta-Analysis [O] . Aza Saijeva, Gintaras Juodzbalys 2020

机译：在未感染的插座上立即植入物放置与受感染的插座：系统审查和荟萃分析
7. NUMA-aware image compositing on multi-GPU platform [O] . Wang, Pan, Cheng, Zhiquan, Martin, Ralph Robert, 2013

机译：在多GPU平台上可识别NUMA的图像

Beyond the Socket: NUMA-Aware GPUs

摘要

著录项

相似文献

相关主题

期刊订阅