【24h】

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems

机译:MPI-ACC:基于加速器的系统中数据移动的集成和可扩展方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement frameworks, thus providing applications with no direct mechanism to perform end-to-end data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC's runtime system enables several key optimizations, including pipelining of data transfers and balancing of communication based on accelerator and node architecture. We demonstrate the extensible design of MPIACC by using the popular CUDA and OpenCL accelerator programming interfaces. We examine the impact of MPI-ACC on communication performance and evaluate application-level benefits on a large-scale epidemiology simulation.
机译:由图形处理单元(GPU)加速的高性能计算系统中的数据移动仍然是一个具有挑战性的问题。当前流行的并行编程模型中的数据通信(例如消息传递接口(MPI))仅限于存储在CPU内存空间中的数据。辅助存储系统(例如GPU内存)未集成到此类数据移动框架中,因此为应用程序提供没有直接机制来执行端到端数据移动。我们介绍了MPI-ACC,这是一个集成的可扩展框架,允许在基于加速器的系统中进行端到端的数据移动。 MPI-ACC通过将对辅助内存空间的支持集成到MPI中,从而提高了生产率和性能。 MPI-ACC的运行时系统可进行多项关键优化,包括数据传输的流水线和基于加速器和节点体系结构的通信平衡。我们通过使用流行的CUDA和OpenCL加速器编程接口演示了MPIACC的可扩展设计。我们研究了MPI-ACC对通信性能的影响,并在大规模流行病学模拟中评估了应用程序级别的收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号