首页> 外文期刊>Concurrency and computation: practice and experience >Initial results on computational performance of Intel many integrated core, sandy bridge, and graphical processing unit architectures: implementation of a 1D c++/OpenMP electrostatic particle-in-cell code
【24h】

Initial results on computational performance of Intel many integrated core, sandy bridge, and graphical processing unit architectures: implementation of a 1D c++/OpenMP electrostatic particle-in-cell code

机译:英特尔许多集成核,沙桥和图形处理单元体系结构的计算性能的初步结果:实现一维c ++ / OpenMP静电粒子编码

获取原文
获取原文并翻译 | 示例

摘要

We present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB),rnand graphical processing unit (GPU). A 1D explicit electrostatic particle-in-cell code is used to simulate arntwo-stream instability in plasma. We compare the computation times for various number of cores/threadsrnand compiler options. The parallelization is implemented via OpenMP with a maximum thread number ofrn128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibilityrnwith CUDA. We assess the speedup due to various auto-vectorization and optimization level compilerrnoptions. Our results show that the MIC is several times slower than SB for a single thread, and it becomesrnfaster than SB when the number of cores increases with vectorization switched on. The compute times forrnthe GPU are consistently about six to seven times faster than the ones for MIC. Compared with SB, thernGPU is about two times faster for a single thread and about an order of magnitude faster for 128 threads.rnThe net speedup, however, for MIC and GPU are almost the same. An initial attempt to offload parts of therncode to the MIC coprocessor shows that there is an optimal number of threads where the speedup reaches arnmaximum.
机译:我们提供了英特尔许多集成核心(MIC),桑迪桥(SB),图形处理器(GPU)的初步比较性能结果。一维显式静电粒子内代码用于模拟等离子体中的双流不稳定性。我们比较了各种内核/线程和编译器选项的计算时间。并行化是通过OpenMP实现的,最大线程数为rn128。通过修改代码语法以实现与CUDA的兼容性,可以在GPU上实现并行化和矢量化。我们评估由于各种自动矢量化和优化级别的编译器选项而导致的加速。我们的结果表明,对于单个线程,MIC比SB慢几倍,并且当向量化打开时内核数增加时,MIC比SB快。 GPU的计算时间始终比MIC快约6至7倍。与SB相比,单个线程的GPU大约快两倍,而128个线程的GPU快大约一个数量级。然而,MIC和GPU的净提速几乎相同。最初尝试将部分代码卸载到MIC协处理器,这表明加速达到arnmaximum时存在最佳线程数。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号