Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture

Tanikawa A.; Yoshikawa K.; Nitadori K.; Okamoto T.

首页> 外文期刊>New astronomy >Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture

【24h】

Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture

机译：Phantom-GRAPE：数字软件库，可通过x86架构上的SIMD指令集来加速无碰撞N体仿真

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We have developed a numerical software library for collisionless N-body simulations named "Phantom-GRAPE" which highly accelerates force calculations among particles by use of a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). In our library, not only the Newton's forces, but also central forces with an arbitrary shape f(r), which has a finite cutoff radius rcut (i.e. f(r)=0 at r> rcut), can be quickly computed. In computing such central forces with an arbitrary force shape f(r), we refer to a pre-calculated look-up table. We also present a new scheme to create the look-up table whose binning is optimal to keep good accuracy in computing forces and whose size is small enough to avoid cache misses. Using an Intel Core i7-2600 processor, we measure the performance of our library for both of the Newton's forces and the arbitrarily shaped central forces. In the case of Newton's forces, we achieve 2×10 ~9 interactions per second with one processor core (or 75 GFLOPS if we count 38 operations per interaction), which is 20 times higher than the performance of an implementation without any explicit use of SIMD instructions, and 2 times than that with the SSE instructions. With four processor cores, we obtain the performance of 8×10 ~9 interactions per second (or 300 GFLOPS). In the case of the arbitrarily shaped central forces, we can calculate 1×10 ~9 and 4×10 ~9 interactions per second with one and four processor cores, respectively. The performance with one processor core is 6 times and 2 times higher than those of the implementations without any use of SIMD instructions and with the SSE instructions. These performances depend only weakly on the number of particles, irrespective of the force shape. It is good contrast with the fact that the performance of force calculations accelerated by graphics processing units (GPUs) depends strongly on the number of particles. Substantially weak dependence of the performance on the number of particles is suitable to collisionless N-body simulations, since these simulations are usually performed with sophisticated N-body solvers such as Tree- and TreePM-methods combined with an individual timestep scheme. We conclude that collisionless N-body simulations accelerated with our library have significant advantage over those accelerated by GPUs, especially on massively parallel environments.

机译：我们已经开发了用于无碰撞N体模拟的数值软件库，名为“ Phantom-GRAPE”，它通过使用x86体系结构的新SIMD指令集扩展，增强版Advanced Vector eXtensions（AVX），极大地加快了粒子之间的力计算。 SIMD扩展流（SSE）的功能。在我们的库中，不仅可以快速计算牛顿力，而且可以计算具有任意截止半径rcut（即r（r> rcut时f（r）= 0））的任意形状f（r）的中心力。在计算具有任意力形状f（r）的此类中心力时，我们参考预先计算的查找表。我们还提出了一种创建查找表的新方案，该查找表的合并最佳以在计算力方面保持良好的准确性，并且查找表的大小足够小以避免缓存未命中。使用英特尔酷睿i7-2600处理器，我们可以测量牛顿力和任意形状的中央力的库性能。在牛顿力的情况下，我们每个处理器核心每秒实现2×10〜9次交互（如果每次交互计算38次操作，则为75 GFLOPS），这比不显式使用任何实现的性能要高20倍。 SIMD指令，是SSE指令的2倍。使用四个处理器核心，我们可以获得每秒8×10〜9个交互（或300 GFLOPS）的性能。在任意形状的中心力的情况下，我们可以分别计算一个处理器内核和四个处理器内核每秒1×10〜9和4×10〜9相互作用。一个处理器内核的性能比不使用SIMD指令和SSE指令的实现分别高6倍和2倍。这些性能仅微弱地取决于粒子的数量，而与力的形状无关。与以下事实形成了很好的对比：图形处理单元（GPU）加速的力计算性能很大程度上取决于粒子的数量。性能对粒子数量的弱依赖关系适用于无碰撞N体模拟，因为这些模拟通常是使用复杂的N体求解器（例如Tree-和TreePM方法）结合单独的时间步方案来执行的。我们得出的结论是，使用我们的库加速的无碰撞N体仿真比GPU加速的N体仿真具有明显优势，尤其是在大规模并行环境中。

著录项

来源
《New astronomy》 |2013年第null期|共15页
作者
Tanikawa A.; Yoshikawa K.; Nitadori K.; Okamoto T.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类天文学;
关键词
Cosmology: large-scale structure of universe; Galaxies: formation; Method: N-body simulations; Stellar dynamics;

机译：宇宙学：宇宙的大规模结构;星系：形成;方法：N体模拟;恒星动力学;

相似文献

外文文献
中文文献
专利

1. Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture [J] . Tanikawa A., Yoshikawa K., Nitadori K., New astronomy . 2013,第Null期

机译：Phantom-GRAPE：数字软件库，可通过x86架构上的SIMD指令集来加速无碰撞N体仿真
2. The impact of x86 instruction set architecture on superscalar processing [J] . Rico R, Perez JI, Frutos JA Journal of systems architecture . 2005,第1期

机译：x86指令集体系结构对超标量处理的影响
3. A parallelizing compile algorithm in hardware/software cosynthesis system for processor cores with packed SIMD type instruction sets [J] . Nobuharu Suzuki, Nozomu Togawa, Masao Yanagisawa, 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2002,第168期

机译：带有压缩SIMD类型指令集的处理器内核的硬件/软件协同系统中的并行化编译算法
4. Instruction Emulation and OS Supports of a Hybrid Binary Translator for x86 Instruction Set Architecture [C] . I-Chun Liu, I-Wei Wu, Jean Jyh-Jiun Shann IEEE international conference on Autonomic and Trusted Computing;IEEE international conference on Ubiquitous Intelligence and Computing;IEEE international conference on Scalable Computing and Communications and Its Associated Workshops;IEEE international conference on cloud and big data computing;IEEE international conference on internet of people . 2015

机译：用于x86指令集体系结构的混合二进制转换器的指令仿真和OS支持
5. ILP-SIMD: An instruction parallel SIMD architecture with short -wire interconnects. [D] . Chung, Kee Shik. 2000

机译：ILP-SIMD：具有短线互连的指令并行SIMD体系结构。
6. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions [O] . Yongchao Liu, Adrianto Wirawan, Bertil Schmidt 2013

机译：CUDASW ++ 3.0：通过耦合CPU和GPU SIMD指令来加速Smith-Waterman蛋白质数据库搜索
7. Phantom-GRAPE: numerical software library to accelerate collisionless $N$-body simulation with SIMD instruction set on x86 architecture [O] . Tanikawa, Ataru, Yoshikawa, Kohji, Nitadori, Keigo, 2012

机译：phantom-GRapE：加速无碰撞的数值软件库在x86架构上使用sImD指令集进行$ N $ -body仿真

Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture

摘要

著录项

相似文献

相关主题

期刊订阅