首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
【24h】

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

机译:针对最新的多核架构优化和调整快速多极方法

获取原文
获取原文并翻译 | 示例

摘要

This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25× on Intel's quad-core Nehalem, 9.4× on AMD's quad-core Barcelona, and 37.6× on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture.
机译:这项工作是对现代多核系统上的单节点性能优化,调整和快速多极方法(FMM)分析的首次广泛研究。我们考虑具有多个性能增强功能的单精度和双精度,包括低级调整,数值逼近,数据结构转换,OpenMP并行化和算法调整。在我们的众多发现中,我们表明优化和并行化可以将双精度性能提高25倍(在英特尔四核Nehalem上为9.4倍,在AMD四核Barcelona上为9.4倍,在Sun的Victoria上为37.6倍)。跌倒(所有系统上的双插槽)。我们还将单精度版本与我们之前基于GPU的最新代码进行比较,令人惊讶的是,最先进的多核架构(Nehalem)与NVIDIA最先进的GPU架构在性能和功效上均达到了同等水平。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号