Implementation and Optimization of Dense LU Decomposition on the Stream Processor

机译：流处理器密集LU分解的实施与优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Developing scientific computing applications on the stream processor has absorbed a lot of researchers attention. In this paper, we implement and optimize dense LU decomposition on the stream processor. Different from other existing parallel algorithms for LU decomposition, StreamLUD algorithm aims at exploiting producer-consumer locality and at overlapping chip-off memory access with kernel execution. Simulation results show that dealing with matrices of different sizes, compared with LUD of HPL on an Itanium 2 processor, StreamLUD we implement and optimize gets a speedup from 2.56 to 3.64 ultimately.

机译：在流处理器上开发科学计算应用已经吸收了很多研究人员的注意。在本文中，我们在流处理器上实施和优化密集的LU分解。不同于其他现有的SPARTALL算法，用于LU分解，Streamlud算法旨在利用生产者 - 消费者的位置，并在重叠的芯片关闭内存访问时与内核执行。仿真结果表明，与ITAnium 2处理器的HPL LUD相比，处理不同尺寸的矩阵，我们实施和优化最终从2.56到3.64获得加速。

著录项

来源
《International Conference on Parallel Processing and Applied Mathematics》|2008年||共11页
会议地点
作者
Ying Zhang; Tao Tang; Gen Li; Xuejun Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Stream processor; LU decomposition; Kernels; Stream; Producer-consumer locality; Scientific computing;

机译：流处理器;Lu分解;仁;流;生产者 - 消费者局部;科学计算;

相似文献

外文文献
中文文献
专利

1. Design and implementation of dual-core MIPS processor for LU decomposition based on FPGA [J] . Rusul Khalil Saad, Safaa S. Omran International Journal of Electrical and Computer Engineering . 2021,第2期

机译：基于FPGA的LU分解双核MIPS处理器的设计与实现
2. Lean processes for optimizing OR capacity utilization: prospective analysis before and after implementation of value stream mapping (VSM). [J] . Schwarz P, Pannes KD, Nathan M, Langenbeck's archives of surgery . 2011,第7期

机译：优化OR能力利用的精益流程：在实施价值流映射（VSM）之前和之后的前瞻性分析。
3. Lean processes for optimizing OR capacity utilization: prospective analysis before and after implementation of value stream mapping (VSM) [J] . Patric Schwarz, Klaus Dieter Pannes, Michel Nathan, Langenbeck's Archives of Surgery . 2011,第7期

机译：优化OR能力利用的精益流程：价值流映射（VSM）实施前后的前瞻性分析
4. Implementation and Optimization of Dense LU Decomposition on the Stream Processor [C] . Ying Zhang, Tao Tang, Gen Li, International Conference on Parallel Processing and Applied Mathematics . 2008

机译：流处理器密集LU分解的实施与优化
5. Validation, Optimization, And Image Processing of Spiral Cine Dense Magnetic Resonance Imaging for the Quantification of Left and Right Ventricular Mechanics [D] . Wehner, Gregory James. 2017

机译：螺旋电影密集磁共振成像的验证，优化和图像处理，用于量化左右心室力学
6. Implementing a Real-time Complex Event Stream Processing System to Help Identify Potential Participants in Clinical and Translational Research Studies [O] . Susan Weber, Henry J. Lowe, Sanjay Malunjkar, 2010

机译：实施实时复杂事件流处理系统以帮助识别临床和转化研究的潜在参与者
7. A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors [O] . Zhang, Kai, Chen, ShuMing, Liu, Wei, 2013

机译：SIMD处理器上LU分解的精细流水线实现

Implementation and Optimization of Dense LU Decomposition on the Stream Processor

摘要

著录项

相似文献

相关主题

期刊订阅