首页> 外文会议>International Conference on Parallel Processing and Applied Mathematics >NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch
【24h】

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch

机译:NVIDIA GPU可扩展性解决umberhomasbatch的多种(批量)的曲线系统实现

获取原文
获取外文期刊封面目录资料

摘要

The solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies have mainly focused on using parallel algorithms to compute such systems, which can efficiently exploit the shared memory and are able to saturate the GPUs capacity with a low number of systems, presenting a poor scalability when dealing with a relatively high number of systems. We propose a new implementation (cuThomasBatch) based on the Thomas algorithm. To achieve a good scalability using this approach is necessary to carry out a transformation in the way that the inputs are stored in memory to exploit coalescence (contiguous threads access to contiguous memory locations). The results given in this study proves that the implementation carried out in this work is able to beat the reference code when dealing with a relatively large number of Tridiagonal systems (2,000-256,000), being closed to 3× (in double precision) and 4× (in single precision) faster using one Kepler NVIDIA GPU.
机译:三对角系统的解决是在许多应用中计算最昂贵的部件之一,以便多个研究探讨采用NVIDIA的GPU来加速这种计算。然而,这些研究主要集中于使用并行算法来计算这样的系统,它可以有效地利用共享存储器,并能以低数量的系统以饱和的GPU容量,具有相对高数量的处理时呈现较差的可扩展性系统。我们提出了基于Thomas算法一个新的实现(cuThomasBatch)。为了实现使用这种方法良好的扩展性,需要进行,所述输入被存储在存储器中以利用聚结(邻近的线程访问连续的存储器位置)的方式变换。在这项研究中给出的结果证明,实施在这项工作中进行了能够与相对较大数量的三对角系统(2,000-256,000)处理时击败参考代码,被封闭以3×(在双精度)和4- ×(在单精度)更快的使用一种开普勒NVIDIA GPU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号