首页> 外文期刊>Engineering analysis with boundary elements >On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm
【24h】

On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm

机译:关于使用核外LU算法在图形处理单元上有效实现边界元素代码的信息

获取原文
获取原文并翻译 | 示例

摘要

A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from has been adapted to run on an Nvidia Tesla general-purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix are performed on the GPU. Out-of-core techniques are used to solve problems larger than the available GPU memory. The code achieved about 10 times speedup in matrix assembly over a single CPU core and about 56Gflops/s in the LU factorization using only 512 Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.
机译:用于求解三维拉普拉斯方程的并置边界元素代码已可以在Nvidia Tesla通用图形处理单元(GPU)上运行,该代码可公开获得。全局矩阵组装和所得密集矩阵的LU分解在GPU上执行。核外技术用于解决大于可用GPU内存的问题。该代码在单个CPU内核上的矩阵汇编速度提高了约10倍,在仅使用512 MB GPU内存的LU分解中实现了约56Gflops / s的速度。包括GPU实现的详细信息以及与标准顺序算法的比较,以说明GPU代码的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号