首页> 外文会议>2011 25th IEEE International Parallel Distributed Processing Symposium >An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
【24h】

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

机译:一种在GPU上求解大型对角线系统的自动调整方法

获取原文

摘要

We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between computation stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 6-11x.
机译:我们提出了一种用于解决GPU上的大型对角线系统的多阶段方法。由于片上共享存储器大小的限制,以前的大型对角线系统无法有效解决。我们通过将系统拆分为较小的系统,然后在芯片上解决它们来解决此问题。我们方法的多阶段特性,再加上各种工作负载和具有不同功能的GPU,必须采用一种自动调整策略来仔细选择计算阶段之间的切换点。特别是,我们展示了两种有效修剪调优空间并避免不切实际的穷举搜索的方法:(1)应用算法知识解耦调优参数,以及(2)根据GPU架构参数估计搜索起点。我们证明了自动调整功能强大,可将性能提高多达5倍,与静态和动态调整相比,平均分别节省了17%和32%的执行时间,并使我们的多级求解器的性能优于Intel MKL三对角线在许多平行三对角线系统上使用6-11x解算器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号