首页> 外文期刊>Future generation computer systems >Tuning linear algebra for energy efficiency on multicore machines by adapting the ATLAS library
【24h】

Tuning linear algebra for energy efficiency on multicore machines by adapting the ATLAS library

机译:通过改编ATLAS库,在多核计算机上调整线性代数以提高能效

获取原文
获取原文并翻译 | 示例
       

摘要

While automated tuning is an established method for minimising the execution time of scientific applications, it has rarely been used for an automated minimisation of the energy consumption. This article presents a study on how to adapt the auto-tuned linear algebra library ATLAS to consider the energy consumption of the execution in its tuning decision. For different tuning parameters of ATLAS, it investigates which differences occur in the tuning results when ATLAS is tuned for a minimal execution time or for a minimal energy consumption. The tuning parameters include the matrix size for the low-level matrix multiplication, loop unrolling factors, crossover points for different matrix-multiplication implementations, the minimum size for matrices to be transposed, or blocking sizes for the last-level cache. Also, parameters for multithreaded execution, such as the number of threads and thread affinity are investigated. The emphasis of this article is on a method proposed with which it is possible to replace a tuning process for execution time by a tuning for energy consumption, especially in the parallel case. ATLAS serves as a prominent example for a tuned library. Furthermore, the article draws conclusions on how to design an energy-optimising autotuning package and how to choose tuning parameters. The article also discusses why the matrix-matrix multiplication has a potential for increasing the energy efficiency while the time efficiency remains constant, whereas other routines have shown to improve their energy efficiency by reducing the execution time.
机译:尽管自动调整是一种最小化科学应用程序执行时间的既定方法,但很少用于自动降低能耗。本文提出了一种关于如何使自动调整的线性代数库ATLAS适应其调整决策中的能量消耗的研究。对于ATLAS的不同调整参数,它研究了在最小执行时间或最小能耗的情况下调整ATLAS时,调整结果中会出现哪些差异。调整参数包括用于低级矩阵乘法的矩阵大小,循环展开因子,用于不同矩阵乘法实现的交叉点,要转置的矩阵的最小大小或用于最后一级缓存的块大小。另外,还研究了用于多线程执行的参数,例如线程数和线程亲和力。本文的重点是提出一种方法,利用该方法可以通过能耗调整来代替执行时间的调整过程,特别是在并行情况下。 ATLAS是经过调优的图书馆的杰出代表。此外,本文总结了如何设计能量优化自动调谐程序包以及如何选择调谐参数的结论。本文还讨论了为什么矩阵矩阵乘法有潜力提高能量效率,而时间效率却保持不变,而其他例程已显示出可以通过减少执行时间来提高能量效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号