The Anatomy of Mr. Scan: A Dissection of Performance of an Extreme Scale GPU-Based Clustering Algorithm

机译：斯堪先生的剖析：基于GPU的极大规模集群算法的性能剖析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The emergence of leadership class systems with GPU-equipped nodes has the potential to vastly increase the performance of existing distributed applications. However, the inclusion of GPU computation into existing extreme scale distributed applications can reveal scalability issues that were absent in the CPU version. The issues exposed in scaling by a GPU can become limiting factors to overall application performance. We developed an extreme scale GPU-based application to perform data clustering on multi-billion point datasets. In this application, called Mr. Scan, we ran into several of these performance limiting issues. Through the use of complete end-to-end benchmarking of Mr. Scan (measuring time from reading and distribution to final output), we were able to identify three major sources of real world performance issues: data distribution, GPU load balancing, and system specific issues such as start-up time. These issues comprised a vast majority of the run time of Mr. Scan. Data distribution alone accounted for 68% of the total run time of Mr. Scan when processing 6.5 billion points on Cray Titan at 8192 nodes. With improvements in these areas, we have been able able to cut total run time of Mr. Scan from 17.5 minutes to 8.3 minutes when clustering 6.5 billion points.

机译：带有GPU的节点的领导层系统的出现有可能极大地提高现有分布式应用程序的性能。但是，将GPU计算包含在现有的超大规模分布式应用程序中可以揭示CPU版本中不存在的可伸缩性问题。 GPU扩展所暴露的问题可能成为限制整个应用程序性能的因素。我们开发了基于GPU的超大规模应用程序，以对数十亿个点数据集执行数据聚类。在名为Scan先生的此应用程序中，我们遇到了其中一些性能限制问题。通过使用Scan先生的完整端到端基准测试（测量从读取和分发到最终输出的时间），我们能够确定现实世界性能问题的三个主要来源：数据分发，GPU负载平衡和系统具体问题，例如启动时间。这些问题占Scan先生运行时间的绝大部分。在Cray Titan的8192个节点上处理65亿个点时，仅数据分发就占Scan先生总运行时间的68％。通过这些方面的改进，在聚集65亿点时，我们能够将Scan先生的总运行时间从17.5分钟减少到8.3分钟。

著录项

来源
《Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems;International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|54-60|共7页
会议地点
作者
Welton Benjamin; Miller Barton P.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
distributed algorithms; graphics processing units; multiprocessing systems; pattern clustering; resource allocation; CPU version; GPU load balancing; GPU-equipped nodes; Mr. scan anatomy; data clustering; data distribution; extreme scale GPU-based clustering algorithm; leadership class systems; multibillion point datasets; time 17.5 min to 8.3 min; Benchmark testing; Clustering algorithms; Graphics processing units; Load management; Partitioning algorithms; Scalability; Spatial indexes; Distributed Systems; GPU Data Clustering; DBSCAN; Performance Analysis;

机译：分布式算法;图形处理单元;多处理系统;模式聚类;资源分配; CPU版本; GPU负载平衡;配备GPU的节点;先生扫描解剖;数据聚类;数据分布;基于GPU的超大规模聚类算法;领导班级系统;数十亿点数据集;时间17.5分钟至8.3分钟;基准测试;聚类算法;图形处理单元;负载管理;分区算法;可扩展性;空间索引;分布式系统; GPU数据聚类; DBSCAN;绩效分析;
入库时间 2022-08-26 15:16:41

相似文献

外文文献
中文文献
专利

1. Cadaver-specific CT scans visualized at the dissection table combined with virtual dissection tables improve learning performance in general gross anatomy [J] . Paech Daniel, Giesel Frederik L., Unterhinninghofen Roland, European radiology . 2017,第5期

机译：Cadaver特定的CT扫描可视化在解剖表中，结合虚拟解剖表，提高了一般性解剖学中的学习表现
2. GPUSCAN: GPU-Based Parallel Structural Clustering Algorithm for Networks [J] . Stovall Thomas Ryan, Kockara Sinan, Avci Recep Parallel and Distributed Systems, IEEE Transactions on . 2015,第12期

机译：GPUSCAN：基于GPU的网络并行结构聚类算法
3. Performance of new GPU-based scan-conversion algorithm implemented using OpenGL. [J] . Steelman WA, Richard WD Ultrasonic Imaging: An International Journal . 2011,第2期

机译：使用OpenGL实现的基于GPU的新扫描转换算法的性能。
4. Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes [C] . Welton Benjamin, Samanas Evan, Miller Barton P. International Conference for High Performance Computing, Networking, Storage and Analysis . 2013

机译：Scan先生：使用基于树的GPGPU节点网络，基于极限规模密度的集群
5. GPU-Based Parallel Algorithms With Architecture-Aware Optimization for Large-Scale Process Simulation of Biological Pathways and High-Throughput Homologous Sequence Search [D] . Jiang, Hanyu. 2018

机译：基于GPU的并行算法，具有架构感知优化，用于生物途径和高通量同源序列搜索的大规模过程仿真
6. Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms [O] . José M. Cecilia, Juan-Carlos Cano, Juan Morales-García, 2020

机译：基于GPU的边缘计算平台对聚类算法的评估
7. The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale [O] . Jack Dongarra, Mark Gates, Azzam Haidar, 2018

机译：奇异值分解：解剖学，用于优化极限尺度的算法

The Anatomy of Mr. Scan: A Dissection of Performance of an Extreme Scale GPU-Based Clustering Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅