首页> 外文期刊>New astronomy >Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture
【24h】

Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture

机译:Phantom-GRAPE:数字软件库,可通过x86架构上的SIMD指令集来加速无碰撞N体仿真

获取原文
获取原文并翻译 | 示例
           

摘要

We have developed a numerical software library for collisionless N-body simulations named "Phantom-GRAPE" which highly accelerates force calculations among particles by use of a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). In our library, not only the Newton's forces, but also central forces with an arbitrary shape f(r), which has a finite cutoff radius rcut (i.e. f(r)=0 at r> rcut), can be quickly computed. In computing such central forces with an arbitrary force shape f(r), we refer to a pre-calculated look-up table. We also present a new scheme to create the look-up table whose binning is optimal to keep good accuracy in computing forces and whose size is small enough to avoid cache misses. Using an Intel Core i7-2600 processor, we measure the performance of our library for both of the Newton's forces and the arbitrarily shaped central forces. In the case of Newton's forces, we achieve 2×10 ~9 interactions per second with one processor core (or 75 GFLOPS if we count 38 operations per interaction), which is 20 times higher than the performance of an implementation without any explicit use of SIMD instructions, and 2 times than that with the SSE instructions. With four processor cores, we obtain the performance of 8×10 ~9 interactions per second (or 300 GFLOPS). In the case of the arbitrarily shaped central forces, we can calculate 1×10 ~9 and 4×10 ~9 interactions per second with one and four processor cores, respectively. The performance with one processor core is 6 times and 2 times higher than those of the implementations without any use of SIMD instructions and with the SSE instructions. These performances depend only weakly on the number of particles, irrespective of the force shape. It is good contrast with the fact that the performance of force calculations accelerated by graphics processing units (GPUs) depends strongly on the number of particles. Substantially weak dependence of the performance on the number of particles is suitable to collisionless N-body simulations, since these simulations are usually performed with sophisticated N-body solvers such as Tree- and TreePM-methods combined with an individual timestep scheme. We conclude that collisionless N-body simulations accelerated with our library have significant advantage over those accelerated by GPUs, especially on massively parallel environments.
机译:我们已经开发了用于无碰撞N体模拟的数值软件库,名为“ Phantom-GRAPE”,它通过使用x86体系结构的新SIMD指令集扩展,增强版Advanced Vector eXtensions(AVX),极大地加快了粒子之间的力计算。 SIMD扩展流(SSE)的功能。在我们的库中,不仅可以快速计算牛顿力,而且可以计算具有任意截止半径rcut(即r(r> rcut时f(r)= 0))的任意形状f(r)的中心力。在计算具有任意力形状f(r)的此类中心力时,我们参考预先计算的查找表。我们还提出了一种创建查找表的新方案,该查找表的合并最佳以在计算力方面保持良好的准确性,并且查找表的大小足够小以避免缓存未命中。使用英特尔酷睿i7-2600处理器,我们可以测量牛顿力和任意形状的中央力的库性能。在牛顿力的情况下,我们每个处理器核心每秒实现2×10〜9次交互(如果每次交互计算38次操作,则为75 GFLOPS),这比不显式使用任何实现的性能要高20倍。 SIMD指令,是SSE指令的2倍。使用四个处理器核心,我们可以获得每秒8×10〜9个交互(或300 GFLOPS)的性能。在任意形状的中心力的情况下,我们可以分别计算一个处理器内核和四个处理器内核每秒1×10〜9和4×10〜9相互作用。一个处理器内核的性能比不使用SIMD指令和SSE指令的实现分别高6倍和2倍。这些性能仅微弱地取决于粒子的数量,而与力的形状无关。与以下事实形成了很好的对比:图形处理单元(GPU)加速的力计算性能很大程度上取决于粒子的数量。性能对粒子数量的弱依赖关系适用于无碰撞N体模拟,因为这些模拟通常是使用复杂的N体求解器(例如Tree-和TreePM方法)结合单独的时间步方案来执行的。我们得出的结论是,使用我们的库加速的无碰撞N体仿真比GPU加速的N体仿真具有明显优势,尤其是在大规模并行环境中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号