Speeding Up Nek5000 with Autotuning and Specialization

机译：通过自动调整和专业化来加速Nek5000

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation, in order to select the best-performing solution for a particular architecture. Specialization optimizes code customized to a particular class of input data set. In this paper, we demonstrate how compiler-based autotuning that incorporates specialization for expected data set sizes of key computations can be used to speed up Nek5000, a spectral-element code. Nek5000 makes heavy use of what are effectively Basic Linear Algebra Subroutine (BLAS) calls, but for very small matrices. Through autotuning and specialization, we can achieve significant performance gains over hand-tuned libraries (e.g., Goto, ATLAS, and ACML BLAS). Additional performance gains are obtained from using higher-level compiler optimizations mat aggregate multiple BLAS calls. We demonstrate more than 2.2X performance gains on an Opteron over the original manually tuned implementation, and speedups of up to 1.26X on the entire application running on 256 nodes of the Cray XT5 Jaguar system at Oak Ridge.

机译：自动调谐技术最近作为一种系统过程出现，用于评估计算的替代实现，以便为特定体系结构选择性能最佳的解决方案。专业化优化了针对特定类别的输入数据集定制的代码。在本文中，我们演示了如何使用基于编译器的自动调整功能（结合了针对关键计算的预期数据集大小的专业化功能）来加速Nek5000（一种频谱元素代码）。 Nek5000大量使用了有效的基本线性代数子例程（BLAS）调用，但仅用于非常小的矩阵。通过自动调整和专业化，与手工调整的库（例如Goto，ATLAS和ACML BLAS）相比，我们可以获得显着的性能提升。通过使用更高级别的编译器优化以及多个BLAS调用的聚合，可以提高性能。我们证明，在Opteron上，与原始手动调整的实现相比，性能提高了2.2倍以上，并且在Oak Ridge的Cray XT5 Jaguar系统的256个节点上运行的整个应用程序，速度提高了1.26倍。

著录项

来源
《24th ACM international conference on supercomputing 2010》|2010年|p.253-262|共10页
会议地点 Amsterdam(NL);Amsterdam(NL)
作者
Jaewook Shin; Chun Chen; Mary W. Hall; Paul F. Fischer; Jacqueline Chame; Paul D. Hovland;
展开▼
作者单位

Argonne National Laboratory 9700 S. Cass Ave. Argonne, IL 60439;

University of Utah 50 S. Central Campus Dr. Salt Lake City, UT 84112;

rnUniversity of Utah 50 S. Central Campus Dr. Salt Lake City, UT 84112;

rnArgonne National Laboratory 9700 S. Cass Ave. Argonne, IL 60439;

rnUSC/ISI 4676 Admiralty Way Marina del Rey, CA 90292;

rnArgonne National Laboratory 9700 S. Cass Ave. Argonne, IL 60439;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
empirical performance tuning; autotuning; specialization;

机译：经验性能调整；自动调节;专业化;

相似文献

外文文献
中文文献
专利

1. Speeding up AutoTuning of the Memory Management Options in Data Analytics [J] . Kunjir Mayuresh Distributed and Parallel Databases . 2020,第4期

机译：加快数据分析中内存管理选项的自动调整
2. Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication [J] . Yilmaz Buse, Aktemur Baris, Garzaran Maria J., ACM Transactions on Architecture and Code Optimization . 2016,第1期

机译：稀疏矩阵向量乘法的自动调整运行时专业化
3. Tools for machine-learning-based empirical autotuning and specialization [J] . Nicholas Chaimov, Scott Biersdorff, Allen D Malony Experimental Mechanics . 2013,第4期

机译：用于基于机器学习的经验自动调整和专业化的工具
4. Speeding Up Nek5000 with Autotuning and Specialization [C] . Jaewook Shin, Chun Chen, Mary W. Hall, ACM international conference on supercomputing . 2010

机译：使用自动调速和专业加快NEK5000
5. Using Traffic Signal Control to Better Serve Pedestrians and Limit Speeding on Urban Arterials: Adaptive Walk Intervals and Speeding Opportunities [D] . Halawani, Ahmed T. M. 2018

机译：使用交通信号控制更好地为行人服务并限制城市动脉的超速行驶：自适应步行间隔和超速机会
6. Are Current Law Enforcement Strategies Associated with a Lower Risk of Repeat Speeding Citations and Crash Involvement? A Longitudinal Study of Speeding Maryland Drivers [O] . Jingyi Li, Sania Amr, Elisa R. Braver, -1

机译：目前的执法策略与重复超速引用的较低风险和崩溃的参与？快速马里兰德司机的纵向研究
7. Speeding Up Nek5000 with Autotuning and Specialization [O] . Jaewook Shin, Mary W. Hall, Jacqueline Chame, 2015

机译：通过自动调整和专业化加速Nek5000

Speeding Up Nek5000 with Autotuning and Specialization

摘要

著录项

相似文献

相关主题

期刊订阅