首页> 外文会议>Parallel, Distributed and Network-Based Processing (PDP), 2012 20th Euromicro International Conference on >Improving Linear Algebra Computation on NUMA Platforms through Auto-tuned Nested Parallelism
【24h】

Improving Linear Algebra Computation on NUMA Platforms through Auto-tuned Nested Parallelism

机译:通过自动调整嵌套并行性改善NUMA平台上的线性代数计算

获取原文
获取原文并翻译 | 示例

摘要

The most computationally demanding scientific and engineering problems are solved with large parallel systems. In some cases those systems are Non-Uniform Memory Access multiprocessors made up of a large number of cores which share a hierarchically organized memory. Basic linear algebra routines of the type of BLAS typically constitute the kernel of the computation for those problems, and the efficient use of these routines in those systems would contribute to a faster solution of a large range of scientific problems. Normally some multithreaded BLAS library optimized for the system is used, but when the number of cores increases the degradation in the performance is significant, and this can produce a misuse of the large, expensive systems. This paper empirically analyses the behaviour in large NUMA systems of the matrix multiplication of the BLAS library, and its combination with OpenMP to obtain nested parallelism. With the auto-tuning method proposed in this work, a reduction in the execution time is achieved with respect to the matrix multiplication of the library.
机译:大型并行系统解决了对计算最苛刻的科学和工程问题。在某些情况下,这些系统是由大量内核组成的非统一内存访问多处理器,这些内核共享分层组织的内存。 BLAS类型的基本线性代数例程通常构成那些问题的计算核心,并且在那些系统中有效使用这些例程将有助于更快地解决各种科学问题。通常,会使用一些针对系统进行了优化的多线程BLAS库,但是当内核数量增加时,性能会显着下降,这可能会导致大型,昂贵系统的滥用。本文对BLAS库的矩阵乘法及其与OpenMP的组合在大型NUMA系统中的行为进行了经验分析,以获取嵌套并行性。使用这项工作中提出的自动调整方法,可以减少库矩阵乘法的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号