GPU Behavior on a Large HPC Cluster

机译：大型HPC集群上的GPU行为

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We discuss observed characteristics of GPUs deployed as accelerators in an HPC cluster at Los Alamos National Laboratory. GPUs have a very good theoretical FLOPS rate, and are reasonably inexpensive and available, but they are relatively new to HPC, which demands both consistently high performance across nodes and consistently low error rate. We modified a standard acceptance procedure to test GPU performance, error rate and reliability characteristics, and ran the test suite on a Fermi HPC cluster at LANL. We discuss here our methodology for this testing, and present results relevant to the deployment of GPUs in an HPC environment. In this paper we show performance variability, power usage variability (possibly related), and some reliability concerns on the GPUs tested. We argue for rigorous testing of these devices in deployment as a way of characterizing their behavior.

机译：我们讨论了Los Alamos National实验室在HPC集群中部署为加速器的GPU的特征。 GPU具有非常良好的理论流标速率，并且具有合理的廉价且可用，但它们对HPC相对较新，这需要跨节点的始终如一的高性能，并且始终如一的低错误率。我们修改了标准验收程序，以测试GPU性能，错误率和可靠性特性，并在LANL的Fermi HPC集群上运行测试套件。我们在这里讨论我们对此测试的方法，以及与HPC环境中GPU部署相关的结果。在本文中，我们显示性能可变性，功率使用变化（可能相关），以及测试GPU上的一些可靠性问题。我们争辩于将这些设备的严格测试部署作为表征其行为的方式。

著录项

来源
《International Euro-Par Conference》|2014年||共10页
会议地点
作者
Nathan DeBardeleben; Sean Blanchard; Laura Monroe; Phil Romero; Daryl Grunau; Craig Idler; Cornell Wright;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2-53;
关键词
Graphics processing units; High performance computing; Reliability; Acceptance testing; Fault-tolerance; Resilience; Error correction;

机译：图形处理单元;高性能计算;可靠性;验收测试;容错;弹性;纠错;

相似文献

外文文献
中文文献
专利

1. Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters [J] . Jianqi Lai, Hang Yu, Zhengyu Tian, Scientific programming . 2020,第2a3期

机译：用于多GPU HPC集群的CFD应用的混合MPI和CUDA并行化
2. Correction to: Leveraging HPC accelerator architectures with modern techniques - hydrologic modeling on GPUs with ParFlow [J] . Hokkanen Jaro, Kollet Stefan, Kraus Jiri, Computational Geosciences . 2021,第5期

机译：纠正：利用现代技术利用HPC加速器架构 - 用Parflow在GPU上进行水文建模
3. Leveraging HPC accelerator architectures with modern techniques - hydrologic modeling on GPUs with ParFlow [J] . Hokkanen Jaro, Kollet Stefan, Kraus Jiri, Computational Geosciences . 2021,第5期

机译：利用现代技术利用HPC加速器架构 - 用Parflow进行GPU的水文建模
4. GPU Behavior on a Large HPC Cluster [C] . Nathan DeBardeleben, Sean Blanchard, Laura Monroe, Parallel processing workshops . 2014

机译：大型HPC集群上的GPU行为
5. Improving communication performance in GPU-accelerated HPC clusters. [D] . Faraji, Iman. 2018

机译：改善GPU加速的HPC群集中的通信性能。
6. GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model [O] . James C. Knight, Thomas Nowotny 2018

机译：在模拟高度连接的皮质模型时GPU在速度和能量方面优于当前的HPC和神经形态解决方案
7. Un gestor de GPUs remotas para clusters HPC [O] . Iserte Sergio 2014

机译：HpC群集的远程GpU管理器

GPU Behavior on a Large HPC Cluster

摘要

著录项

相似文献

相关主题

期刊订阅