首页> 外文会议>International Euro-Par Conference >GPU Behavior on a Large HPC Cluster
【24h】

GPU Behavior on a Large HPC Cluster

机译:大型HPC集群上的GPU行为

获取原文

摘要

We discuss observed characteristics of GPUs deployed as accelerators in an HPC cluster at Los Alamos National Laboratory. GPUs have a very good theoretical FLOPS rate, and are reasonably inexpensive and available, but they are relatively new to HPC, which demands both consistently high performance across nodes and consistently low error rate. We modified a standard acceptance procedure to test GPU performance, error rate and reliability characteristics, and ran the test suite on a Fermi HPC cluster at LANL. We discuss here our methodology for this testing, and present results relevant to the deployment of GPUs in an HPC environment. In this paper we show performance variability, power usage variability (possibly related), and some reliability concerns on the GPUs tested. We argue for rigorous testing of these devices in deployment as a way of characterizing their behavior.
机译:我们讨论了Los Alamos National实验室在HPC集群中部署为加速器的GPU的特征。 GPU具有非常良好的理论流标速率,并且具有合理的廉价且可用,但它们对HPC相对较新,这需要跨节点的始终如一的高性能,并且始终如一的低错误率。我们修改了标准验收程序,以测试GPU性能,错误率和可靠性特性,并在LANL的Fermi HPC集群上运行测试套件。我们在这里讨论我们对此测试的方法,以及与HPC环境中GPU部署相关的结果。在本文中,我们显示性能可变性,功率使用变化(可能相关),以及测试GPU上的一些可靠性问题。我们争辩于将这些设备的严格测试部署作为表征其行为的方式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号