Multilevel Approaches to Fine Tune Performance of Linear Algebra Libraries

机译：多级线性代数库微调性能的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a multilevel methodology to improve the performance of parallel codes whose run time increases at a faster rate than the increase in workload. We have derived the conditions under which the proposed methodology improves performance for a simple parallel computing model. Formulas to predict the amount of performance improvement that is attainable are also derived for this simple computing model. The effectiveness of the proposed strategy is demonstrated by applying it to the highly optimized BLAS (Basic Linear Algebra Subprograms) routines cblas_dgemm, cblas_dtrmm and cblas_dsymm from the Intel MKL (Math Kernel Library) on the Intel KNL (Knights Landing) platform. We are able to reduce the run time of MKL cblas_dgemm by 20%, cblas_dtrmm by 15%, and cblas_dsymm by 50% on double-precision matrices of size 16Kx16K. Further, our performance prediction formulas are demonstrated to be accurate on this platform.

机译：我们提出了一种多级方法，以提高运行时间以比工作量的增加更快地增加的并行代码的性能。我们派生了所提出的方法改善了一个简单的并行计算模型性能的条件。用于预测可达到的性能改进量的公式也导出了这种简单计算模型。通过将高度优化的BLA（基本线性代数）例程CBLAS_DGEMM，CBLAS_DTRMM和CBLAS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DSYMM来证明了所提出的策略的有效性，从英特尔KNL（骑士登陆）平台上。我们能够将MKL CBLAS_DGEMM的运行时间减少20％，CBLAS_DTRMM尺寸为16kx16k的双重精度矩阵，CBLAS_DTRMM达到15％，CBLAS_DSYMM达到50％。此外，我们的性能预测公式被证明在该平台上是准确的。

著录项

来源
《International Symposium on Signal Processing and Information Technology》|2018年|1 v.|共6页
会议地点
作者
Sanaz Gheibi; Tania Banerjee; Sanjay Ranka; Sartaj Sahni;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
Matrix decomposition; Computers; Layout; Libraries; Computational modeling; Runtime;

机译：矩阵分解;计算机;布局;图书馆;计算建模;运行时;

相似文献

外文文献
中文文献
专利

1. sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library) [J] . Pedro Valero-Lara, Sandra Catalán, Xavier Martorell, Journal of Parallel and Distributed Computing . 2020,第Apra期

机译：SLASS：基于OPEMP扩展的全自动自动调谐线性代数库（LASS库）
2. Tuning linear algebra for energy efficiency on multicore machines by adapting the ATLAS library [J] . Thomas Jakobs, Jens Lang, Gudula Rünger, Future generation computer systems . 2018,第MAY期

机译：通过改编ATLAS库，在多核计算机上调整线性代数以提高能效
3. Architecture of an automatically tuned linear algebra library [J] . Javier Cuenca, Domingo Gimenez, Jose Gonzalez Parallel Computing . 2004,第2期

机译：自动调整的线性代数库的体系结构
4. Multilevel Approaches to Fine Tune Performance of Linear Algebra Libraries [C] . Sanaz Gheibi, Tania Banerjee, Sanjay Ranka, International Symposium on Signal Processing and Information Technology . 2019

机译：线性代数库精调性能的多级方法
5. Supramolecular chemistry approaches for fine-tuning physico-chemical properties of active pharmaceutical ingredients. [D] . Ma, Zhenbo. 2009

机译：超分子化学方法可微调活性药物成分的理化性质。
6. Fine-tuning the expression of pathway gene in yeast using a regulatory library formed by fusing a synthetic minimal promoter with different Kozak variants [O] . Liping Xu, Pingping Liu, Zhubo Dai, 2021

机译：用不同的kozak变体融合合成最小启动子形成的调节文库微调酵母中途径基因的表达
7. Ginkgo: A high performance numerical linear algebra library [O] . Hartwig Anzt, Terry Cojean, Yen-Chen Chen, 2020

机译：银杏：高性能数值线性代数库
8. Design of linear algebra libraries for high performance computers. [R] . Dongarra, J. J., Walker, D. W. 1993

机译：高性能计算机线性代数库的设计。

Multilevel Approaches to Fine Tune Performance of Linear Algebra Libraries

摘要

著录项

相似文献

相关主题

期刊订阅