首页> 外文会议>International Symposium on Signal Processing and Information Technology >Multilevel Approaches to Fine Tune Performance of Linear Algebra Libraries
【24h】

Multilevel Approaches to Fine Tune Performance of Linear Algebra Libraries

机译:多级线性代数库微调性能的方法

获取原文

摘要

We propose a multilevel methodology to improve the performance of parallel codes whose run time increases at a faster rate than the increase in workload. We have derived the conditions under which the proposed methodology improves performance for a simple parallel computing model. Formulas to predict the amount of performance improvement that is attainable are also derived for this simple computing model. The effectiveness of the proposed strategy is demonstrated by applying it to the highly optimized BLAS (Basic Linear Algebra Subprograms) routines cblas_dgemm, cblas_dtrmm and cblas_dsymm from the Intel MKL (Math Kernel Library) on the Intel KNL (Knights Landing) platform. We are able to reduce the run time of MKL cblas_dgemm by 20%, cblas_dtrmm by 15%, and cblas_dsymm by 50% on double-precision matrices of size 16Kx16K. Further, our performance prediction formulas are demonstrated to be accurate on this platform.
机译:我们提出了一种多级方法,以提高运行时间以比工作量的增加更快地增加的并行代码的性能。我们派生了所提出的方法改善了一个简单的并行计算模型性能的条件。用于预测可达到的性能改进量的公式也导出了这种简单计算模型。通过将高度优化的BLA(基本线性代数)例程CBLAS_DGEMM,CBLAS_DTRMM和CBLAS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DS_DSYMM来证明了所提出的策略的有效性,从英特尔KNL(骑士登陆)平台上。我们能够将MKL CBLAS_DGEMM的运行时间减少20%,CBLAS_DTRMM尺寸为16kx16k的双重精度矩阵,CBLAS_DTRMM达到15%,CBLAS_DSYMM达到50%。此外,我们的性能预测公式被证明在该平台上是准确的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号