Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

机译：优化和调整最先进的多核架构的快速多极方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25???? on Intel's quad-core Nehalem, 9.4???? on AMD's quad-core Barcelona, and 37.6???? on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture.

机译：这项工作介绍了现代多核系统上快速多极法（FMM）的单节点性能优化，调整和分析的第一次广泛研究。我们考虑具有许多性能增强的单精度和双精度，包括低级调谐，数值近似，数据结构转换，OpenMP并行化和算法调整。在我们的许多发现中，我们表明优化和并行化可以提高25次提高双重精度性能。在英特尔的四核Nehalem，9.4 ???? 在AMD的四核巴塞罗那和37.6 ???? 在Sun的维多利亚瀑布（所有系统上的双插座）。我们还将我们的单精度版本与我们的先前最先进的GPU的代码进行了比较，令人惊讶的是，最先进的多核架构（Nehalem）占据了性能和功率效率的奇偶校验，并使用NVIDIA最先进的GPU架构。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共12页
会议地点
作者
Chandramowlishwaran A.; Williams S.; Oliker L.; Lashuk I.; Biros G.; Vuduc R.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. DYNAMIC AUTOTUNING OF ADAPTIVE FAST MULTIPOLE METHODS ON HYBRID MULTICORE CPU AND GPU SYSTEMS [J] . MARCUS HOLM, STEFAN ENGBLOM, ANDERS GOUDE, SIAM Journal on Scientific Computing . 2014,第4期

机译：混合多核CPU和GPU系统上的自适应快速多极方法的动态自动化
2. The fast multipole method on parallel clusters, multicore processors, and graphics processing units [J] . Darve E., Cecka C., Takahashi T. Comptes rendus. Mecanique . 2011,第2a3期

机译：并行集群，多核处理器和图形处理单元上的快速多极方法
3. Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units [J] . Takahashi T., Cecka C., Fong W., International Journal for Numerical Methods in Engineering . 2012,第1期

机译：使用图形处理单元的快速多极子方法优化多极子到本地算子
4. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures [C] . Chandramowlishwaran Aparna, Williams Samuel, Oliker Leonid, 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：针对最新的多核架构优化和调整快速多极方法
5. Fast transforms based on structured matrices with applications to the fast multipole method. [D] . Tang, Zhihui. 2004

机译：基于结构化矩阵的快速变换及其在快速多极点方法中的应用。
6. Fast inverse scattering solutions using the distorted Born iterative method and the multilevel fast multipole algorithm [O] . Andrew J. Hesford, Weng C. Chew -1

机译：使用失真的Born迭代方法和多级快速多极子算法的快速逆散射解
7. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures [O] . Aparna Ch, Samuel Williams, Leonid Oliker, 2010

机译：针对最新的多核架构优化和调整快速多极方法

Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅