...
首页> 外文期刊>Parallel Processing Letters >Exploiting ILP, TLP, and DLP to Improve Multi-Core Performance of One-Sided Jacobi SVD
【24h】

Exploiting ILP, TLP, and DLP to Improve Multi-Core Performance of One-Sided Jacobi SVD

机译:利用ILP,TLP和DLP改善单面Jacobi SVD的多核性能

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper shows how the Performance of singular value decomposition (SVD) is enhanced through the exploitation of ILP, TLP, and DLP on Intel multi-core processors using superscalar execution, multi-threading computation, and Streaming SIMD extensions, respectively. To facilitate the exploitation of TLP on multiple execution cores, the well-known cyclic one-sided Jacobi algorithm is restructured to work in parallel. On two dual-core Intel Xeon processors with hyper-threading technology running at 3.0 GHz, our results show that the multi-threaded implementation of one-sided Jacobi SVD gives about four times faster than the Single-threaded superscalar implementation. Furthermore, the multi-threaded SIMD implementation speeds up the execution of single-threaded one-sided Jacobi by a factor of 10, which is close to the ideal speedup. On a reasonable large matrix size fitted in the L2 cache, our results show a Performance of 11 GFLOPS (double-precision) is achieved on the target System through the exploitation of ILP, TLP, and DLP as well as memory hierarchy.
机译:本文说明如何通过分别使用超标量执行,多线程计算和Streaming SIMD扩展在Intel多核处理器上利用ILP,TLP和DLP来提高奇异值分解(SVD)的性能。为了促进在多个执行内核上利用TLP,将众所周知的循环单面Jacobi算法进行了重组以并行工作。在具有3.0 GHz的超线程技术的两个双核Intel Xeon处理器上,我们的结果表明,单面Jacobi SVD的多线程实现比单线程超标量实现快四倍。此外,多线程SIMD实现将单线程单面Jacobi的执行速度提高了10倍,接近理想的加速速度。在适用于L2高速缓存的合理的大型矩阵大小上,我们的结果表明,通过利用ILP,TLP和DLP以及内存层次结构,目标系统的性能达到了11 GFLOPS(双精度)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号