【24h】

Enabling PGAS Productivity with Hardware Support for Shared Address Mapping: A UPC Case Study

机译:使用共享地址映射的硬件支持来提高PGAS生产率:UPC案例研究

获取原文
获取原文并翻译 | 示例

摘要

The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model (e.g. MPI) and the easy-to-use, but locality-agnostic, shared memory model (e.g. OpenMP). However, the PGAS rich memory model comes at a performance cost which can hinder its potential for scalability and performance. To contain this overhead and achieve full performance, compiler optimizations may not be sufficient and manual optimizations are typically added. This, however, can severely limit the productivity advantage. Such optimizations are usually targeted at reducing address translation overheads for shared data structures. This paper proposes a hardware architectural support for PGAS, which allows the processor to efficiently handle shared addresses. This eliminates the need for such hand-tuning, while maintaining the performance and productivity of PGAS languages. We propose to avail this hardware support to compilers by introducing new instructions to efficiently access and traverse the PGAS memory space. A prototype compiler is realized by extending the Berkeley Unified Parallel C (UPC) compiler. It allows unmodified code to use the new instructions without the user intervention, thereby creating a real productive programming environment. Two different implementations of the system are realized: the first is implemented using the full system simulator Gem5, which allows the evaluation of the performance gain. The second is implemented using a soft core processor Leon3 on an FPGA to verify the implement ability and to parameterize the cost of the new hardware and its instructions. The new instructions show promising results for the NAS Parallel Benchmarks implemented in UPC. A speedup of up to 5.5x is demonstrated for unmodified codes. Unmodified code performance using this hardware was shown to also surpass the performance of manually optimized code by up to 10%.
机译:分区全局地址空间(PGAS)编程模型在可识别位置的,但显式的消息传递模型(例如MPI)和易于使用但与位置无关的共享内存模型(例如OpenMP)之间取得了平衡。但是,PGAS丰富的内存模型的性能成本可能会阻碍其扩展性和性能的潜力。为了控制这些开销并获得完整的性能,编译器优化可能不够,通常需要添加手动优化。但是,这会严重限制生产率优势。此类优化通常旨在减少共享数据结构的地址转换开销。本文提出了对PGAS的硬件架构支持,该支持使处理器能够有效地处理共享地址。这消除了对此类手动调整的需求,同时保持了PGAS语言的性能和生产率。我们建议通过引入新指令来有效地访问和遍历PGAS存储器空间,从而为编译器提供这种硬件支持。通过扩展Berkeley统一并行C(UPC)编译器来实现原型编译器。它允许未经修改的代码在无需用户干预的情况下使用新指令,从而创建了一个真正高效的编程环境。实现了系统的两种不同实现:第一种是使用完整的系统模拟器Gem5实现的,该模拟器允许评估性能增益。第二种是使用FPGA上的软核处理器Leon3来实现的,以验证实现能力并参数化新硬件及其指令的成本。新指令显示了在UPC中实施的NAS并行基准测试的可喜结果。对于未经修改的代码,显示出高达5.5倍的加速。使用该硬件的未修改代码性能也显示出比手动优化代码的性能高出10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号