首页> 外文会议>International Conference on Parallel Processing and Applied Mathematics >Implementation and Optimization of Dense LU Decomposition on the Stream Processor
【24h】

Implementation and Optimization of Dense LU Decomposition on the Stream Processor

机译:流处理器密集LU分解的实施与优化

获取原文

摘要

Developing scientific computing applications on the stream processor has absorbed a lot of researchers attention. In this paper, we implement and optimize dense LU decomposition on the stream processor. Different from other existing parallel algorithms for LU decomposition, StreamLUD algorithm aims at exploiting producer-consumer locality and at overlapping chip-off memory access with kernel execution. Simulation results show that dealing with matrices of different sizes, compared with LUD of HPL on an Itanium 2 processor, StreamLUD we implement and optimize gets a speedup from 2.56 to 3.64 ultimately.
机译:在流处理器上开发科学计算应用已经吸收了很多研究人员的注意。在本文中,我们在流处理器上实施和优化密集的LU分解。不同于其他现有的SPARTALL算法,用于LU分解,Streamlud算法旨在利用生产者 - 消费者的位置,并在重叠的芯片关闭内存访问时与内核执行。仿真结果表明,与ITAniu​​m 2处理器的HPL LUD相比,处理不同尺寸的矩阵,我们实施和优化最终从2.56到3.64获得加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号