Memory Hierarchy Exploration For Accelerating The Parallel Computation Of Svds

Mostafa I. Soliman

首页> 外文期刊>Neural, Parallel & Scientific Computations >Memory Hierarchy Exploration For Accelerating The Parallel Computation Of Svds

【24h】

Memory Hierarchy Exploration For Accelerating The Parallel Computation Of Svds

机译：加快SVD并行计算的内存层次结构探索

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The performance of many applications on modern computers is often limited by memory latency rather than by processor speed. For computers with memory hierarchy, it is preferable to perform the computation on blocks of data to reduce the impact of memory latency by reusing the loaded data in cache memories. This paper proposes a fast algorithm for parallel computing the extremely useful singular value decomposition (SVD) based on one-sided Jacobi on multi-level memory hierarchy architectures. On P parallel processors, the given matrix is divided into super-rows and then these super-rows are partitioned into 2P blocks. One key point of the proposed algorithm is the highly exploitation of memory hierarchy by performing all computations on super-rows loaded in cache memory rather than on rows. Another key point is that the number of sweeps required for convergence is very close to cyclic one-sided Jacobi. Third key point of the proposed algorithm is that the number of sweeps required for convergence does not depend drastically on the size of the input matrix. On two dual-core Intel Xeon processors, our results show that the performance of parallel implementation of the proposed algorithm is around 11 times higher than the sequential implementation on the same hardware. Moreover, a performance of around 10 GFLOPS (double-precision) can be achieved on the target system using multi-threading, Intel SIMD instructions, matrix blocking, and loop unrolling techniques.

机译：现代计算机上许多应用程序的性能通常受内存延迟而不是处理器速度的限制。对于具有内存层次结构的计算机，最好对数据块执行计算，以通过重用高速缓存中的已加载数据来减少内存延迟的影响。本文提出了一种基于多层次存储层次结构的单面Jacobi并行计算极其有用的奇异值分解（SVD）的快速算法。在P个并行处理器上，将给定的矩阵划分为多个超级行，然后将这些超级行划分为2P个块。所提出算法的关键点是通过对缓存中加载的超行而不是对行执行所有计算来高度利用内存层次结构。另一个关键点是收敛所需的扫描次数非常接近循环单面Jacobi。提出的算法的第三个关键点是收敛所需的扫描次数并不完全取决于输入矩阵的大小。在两个双核Intel Xeon处理器上，我们的结果表明，该算法的并行实现性能比相同硬件上的顺序实现性能高11倍左右。此外，使用多线程，Intel SIMD指令，矩阵阻塞和循环展开技术，可以在目标系统上实现约10 GFLOPS（双精度）的性能。

著录项

来源
《Neural, Parallel & Scientific Computations》 |2008年第4期|p.543-561|共19页
作者
Mostafa I. Soliman;
展开▼
作者单位

Computer & System Section, Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
memory hierarchy; multi-core computation; multi-threading techniques; parallel algorithms; performance evaluation; simd; svd; one-sided jacobi;

机译：内存层次结构;多核计算;多线程技术;并行算法;性能评估;simd;svd;单面jacobi;

相似文献

外文文献
中文文献
专利

1. Exploiting GPU memory hierarchy for accelerating a specialized stencil computation [J] . Thanasekhar Balaiah, Ranjani Parthasarathi Concurrency, practice and experience . 2017,第21期

机译：利用GPU内存层次结构来加速专业的模具计算
2. Streaming Breakpoint Graph Analytics for Accelerating and Parallelizing the Computation of DCJ Median of Three Genomes [J] . Zhaoming Yin, Jijun Tang, Stephen W. Schaeffer, Procedia Computer Science . 2013,第1期

机译：流式断点图分析，可加速和并行化三个基因组的DCJ中值的计算
3. Parallelizing message schedules to accelerate the computations of hash functions [J] . Shay Gueron, Vlad Krasnov Journal of cryptographic engineering . 2012,第4期

机译：并行化消息调度以加速哈希函数的计算
4. A Parallel-friendly Majority Gate to Accelerate In-memory Computation [C] . John Reuben, Stefan Pechmann International Conference on Application-specific Systems, Architectures and Processors . 2020

机译：并行友好的多数门可加速内存计算
5. Using parallel computation to apply the singular value decomposition (SVD) in solving for large earth gravity fields based on satellite data. [D] . Hinga, Mark Brandon. 2004

机译：使用并行计算将奇异值分解（SVD）应用到基于卫星数据的大重力场求解中。
6. Reply to Einarsson: The computational power of parallel network exploration with many bioagents [O] . Dan V. Nicolau Jr., Mercy Lard, Till Korten, 2016

机译：回复Einarsson：具有许多生物制剂的并行网络探索的计算能力
7. A Parallel-friendly Majority Gate to Accelerate In-memory Computation [O] . John Reuben, Stefan Pechmann 2020

机译：None
8. Hierarchical Associative Memories for Parallel Computation [R] . Gertz, J. L. 1970

机译：并行计算的分层联想记忆

Memory Hierarchy Exploration For Accelerating The Parallel Computation Of Svds

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅