首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
【24h】

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

机译:优化和调整最先进的多核架构的快速多极方法

获取原文
获取外文期刊封面目录资料

摘要

This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25???? on Intel's quad-core Nehalem, 9.4???? on AMD's quad-core Barcelona, and 37.6???? on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture.
机译:这项工作介绍了现代多核系统上快速多极法(FMM)的单节点性能优化,调整和分析的第一次广泛研究。 我们考虑具有许多性能增强的单精度和双精度,包括低级调谐,数值近似,数据结构转换,OpenMP并行化和算法调整。 在我们的许多发现中,我们表明优化和并行化可以提高25次提高双重精度性能。 在英特尔的四核Nehalem,9.4 ???? 在AMD的四核巴塞罗那和37.6 ???? 在Sun的维多利亚瀑布(所有系统上的双插座)。 我们还将我们的单精度版本与我们的先前最先进的GPU的代码进行了比较,令人惊讶的是,最先进的多核架构(Nehalem)占据了性能和功率效率的奇偶校验,并使用NVIDIA最先进的GPU架构 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号