Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

机译：针对最新的多核架构优化和调整快速多极方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25ÃÂ on Intel's quad-core Nehalem, 9.4ÃÂ on AMD's quad-core Barcelona, and 37.6ÃÂ on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture.

机译：这项工作是对现代多核系统上的单节点性能优化，调整和快速多极方法（FMM）分析的首次广泛研究。我们考虑具有多个性能增强功能的单精度和双精度，包括低级调整，数值逼近，数据结构转换，OpenMP并行化和算法调整。在我们的众多发现中，我们表明优化和并行化可以将双精度性能提高25倍（在英特尔四核Nehalem上为9.4倍，在AMD四核Barcelona上为9.4倍，在Sun的Victoria上为37.6倍）。跌倒（所有系统上的双插槽）。我们还将单精度版本与我们之前基于GPU的最新代码进行比较，令人惊讶的是，最先进的多核架构（Nehalem）与NVIDIA最先进的GPU架构在性能和功效上均达到了同等水平。。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|p.1-12|共12页
会议地点 Atlanta GA(US)
作者
Chandramowlishwaran A.; Williams S.; Oliker L.; Lashuk I.; Biros G.; Vuduc R.;
展开▼
作者单位

CRD, Lawrence Berkeley Nat. Lab., Berkeley, CA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. DYNAMIC AUTOTUNING OF ADAPTIVE FAST MULTIPOLE METHODS ON HYBRID MULTICORE CPU AND GPU SYSTEMS [J] . MARCUS HOLM, STEFAN ENGBLOM, ANDERS GOUDE, SIAM Journal on Scientific Computing . 2014,第4期

机译：混合多核CPU和GPU系统上的自适应快速多极方法的动态自动化
2. The fast multipole method on parallel clusters, multicore processors, and graphics processing units [J] . Darve E., Cecka C., Takahashi T. Comptes rendus. Mecanique . 2011,第2a3期

机译：并行集群，多核处理器和图形处理单元上的快速多极方法
3. Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units [J] . Takahashi T., Cecka C., Fong W., International Journal for Numerical Methods in Engineering . 2012,第1期

机译：使用图形处理单元的快速多极子方法优化多极子到本地算子
4. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures [C] . Chandramowlishwaran Aparna, Williams Samuel, Oliker Leonid, 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：针对最新的多核架构优化和调整快速多极方法
5. Fast transforms based on structured matrices with applications to the fast multipole method. [D] . Tang, Zhihui. 2004

机译：基于结构化矩阵的快速变换及其在快速多极点方法中的应用。
6. Fast inverse scattering solutions using the distorted Born iterative method and the multilevel fast multipole algorithm [O] . Andrew J. Hesford, Weng C. Chew -1

机译：使用失真的Born迭代方法和多级快速多极子算法的快速逆散射解
7. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures [O] . Aparna Ch, Samuel Williams, Leonid Oliker, 2010

机译：针对最新的多核架构优化和调整快速多极方法

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

摘要

著录项

相似文献

相关主题

期刊订阅