首页> 外文期刊>Cluster computing >Performance analysis and comparison of cellular automata GPU implementations
【24h】

Performance analysis and comparison of cellular automata GPU implementations

机译:蜂窝自动机GPU实现的性能分析与比较

获取原文
获取原文并翻译 | 示例
       

摘要

Cellular automata (CA) models are of interest to several scientific areas, and there is a growing interest in exploring large systems which would need high performance computing. In this work a CA implementation is presented which performs well in five different NVIDIA GPU architectures, from Tesla to Maxwell, simulating systems with up to a billion cells. Using the game of life (GoL) and a more complex variation of GoL as examples, a performance of 5.58e6 evaluated cells/s is achieved. The two optimizations most often used in previous studies are the use of shared memory and Multicell algorithms. Here, these optimizations do not improve performance in Fermi or newer architectures. The GoL CA code running in an NVIDIA Titan X obtained a speedup of up to similar to 85 x and up to similar to 230 x for a more complex CA, compared to an optimized serial CPU implementation. Finally, the efficiency of each GPU is analyzed in terms of cell performance/transistors and cell performance/bandwidth showing how the architectures improved for this particular problem.
机译:蜂窝自动机(CA)模型对几个科学领域感兴趣,并且在探索需要高性能计算的大型系统方面存在日益增长的兴趣。在这项工作中,介绍了CA实施,其中在五个不同的NVIDIA GPU架构中,从特斯拉到Maxwell,模拟系统,具有高达10亿个细胞的模拟系统。使用寿命游戏(GOL)和GOL的更复杂变化作为示例,实现了5.58E6评估的细胞/ S的性能。以前研究最常使用的两种优化是使用共享内存和多电池算法。在这里,这些优化不会提高FERMI或更新架构中的性能。与优化的串行CPU实现相比,在NVIDIA Titan X中运行的GOL CA代码在NVIDIA TITAN X中获得的加速至85 x和更类似于230 x,对于更复杂的CA。最后,根据细胞性能/晶体管和单元性能/带宽分析每个GPU的效率,示出了架构如何改进该特定问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号