【24h】

The Anatomy of Mr. Scan: A Dissection of Performance of an Extreme Scale GPU-Based Clustering Algorithm

机译:斯堪先生的剖析:基于GPU的极大规模集群算法的性能剖析

获取原文

摘要

The emergence of leadership class systems with GPU-equipped nodes has the potential to vastly increase the performance of existing distributed applications. However, the inclusion of GPU computation into existing extreme scale distributed applications can reveal scalability issues that were absent in the CPU version. The issues exposed in scaling by a GPU can become limiting factors to overall application performance. We developed an extreme scale GPU-based application to perform data clustering on multi-billion point datasets. In this application, called Mr. Scan, we ran into several of these performance limiting issues. Through the use of complete end-to-end benchmarking of Mr. Scan (measuring time from reading and distribution to final output), we were able to identify three major sources of real world performance issues: data distribution, GPU load balancing, and system specific issues such as start-up time. These issues comprised a vast majority of the run time of Mr. Scan. Data distribution alone accounted for 68% of the total run time of Mr. Scan when processing 6.5 billion points on Cray Titan at 8192 nodes. With improvements in these areas, we have been able able to cut total run time of Mr. Scan from 17.5 minutes to 8.3 minutes when clustering 6.5 billion points.
机译:带有GPU的节点的领导层系统的出现有可能极大地提高现有分布式应用程序的性能。但是,将GPU计算包含在现有的超大规模分布式应用程序中可以揭示CPU版本中不存在的可伸缩性问题。 GPU扩展所暴露的问题可能成为限制整个应用程序性能的因素。我们开发了基于GPU的超大规模应用程序,以对数十亿个点数据集执行数据聚类。在名为Scan先生的此应用程序中,我们遇到了其中一些性能限制问题。通过使用Scan先生的完整端到端基准测试(测量从读取和分发到最终输出的时间),我们能够确定现实世界性能问题的三个主要来源:数据分发,GPU负载平衡和系统具体问题,例如启动时间。这些问题占Scan先生运行时间的绝大部分。在Cray Titan的8192个节点上处理65亿个点时,仅数据分发就占Scan先生总运行时间的68%。通过这些方面的改进,在聚集65亿点时,我们能够将Scan先生的总运行时间从17.5分钟减少到8.3分钟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号