首页> 外文期刊>Journal of Parallel and Distributed Computing >HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems
【24h】

HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

机译:HiCOO:在Cray XT系统上的全局地址空间编程模型中的可伸缩通信的分层协作

获取原文
获取原文并翻译 | 示例

摘要

Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model. Typically this is realized through one-sided operations that can enable asynchronous communication and data movement. With the size of petascale systems reaching 10,000s of nodes and 100,000s of cores, the underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. For any solution that addresses these resource scalability challenges, equally important is the need to maintain the performance of GAS programming models. In this paper, we describe a Hierarchical Cooperation (HiCOO) architecture for scalable communication in GAS programming models. HiCOO formulates a cooperative communication architecture: with inter-node cooperation amongst multiple nodes (a.k.a multinode) and hierarchical cooperation among multinodes that are arranged in various virtual topologies. We have implemented HiCOO for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). By extensively evaluating different virtual topologies in HiCOO in terms of their impact to memory scalability, network contention, and application performance, we identify MFCG as the most suitable virtual topology. The resulting HiCOO architecture is able to realize scalable resource management and achieve resilience to network contention, while at the same time maintaining or enhancing the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.
机译:全局地址空间(GAS)编程模型可实现便捷的共享内存样式的寻址模型。通常,这是通过可实现异步通信和数据移动的单面操作实现的。随着PB级系统的规模达到10,000个节点和100,000个核心,底层运行时系统面临着以下严峻挑战:(1)可伸缩地管理资源(例如通信缓冲区的内存);(2)妥善处理不可预测的通信模式;以及任何相关的争用。对于任何解决这些资源可伸缩性挑战的解决方案,保持GAS编程模型的性能同样重要。在本文中,我们描述了用于GAS编程模型中可扩展通信的分层合作(HiCOO)架构。 HiCOO制定了一种协作通信体系结构:在多个节点(又称为多节点)之间进行节点间协作,并在以各种虚拟拓扑排列的多节点之间进行分层协作。我们已经为流行的GAS运行时库Aggregate Remote Memory Copy Interface(ARMCI)实现了HiCOO。通过广泛地评估HiCOO中不同的虚拟拓扑结构对内存可伸缩性,网络争用和应用程序性能的影响,我们将MFCG确定为最合适的虚拟拓扑结构。最终的HiCOO架构能够实现可扩展的资源管理并实现对网络争用的恢复能力,同时保持或增强科学应用程序的性能。在一种情况下,它将NWChem应用程序的总执行时间减少了52%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号