Scalable PGAS collective operations in NUMA clusters

Mallon Damin A.; Taboada Guillermo L.; Teijeiro Carlos; Gonzalez-Dominguez Jorge; Gomez Andres; Wibecan Brian

首页> 外文期刊>Cluster computing >Scalable PGAS collective operations in NUMA clusters

【24h】

Scalable PGAS collective operations in NUMA clusters

机译：NUMA集群中的可扩展PGAS集合操作

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.

机译：每个处理器不断增加的内核数量正在使许多基于内核的系统无处不在。这涉及处理非统一内存访问（NUMA）系统和处理器核心层次结构中的多层内存，可通过复杂的互连访问这些层次，以分派处理元素所需的越来越多的数据。有效和可伸缩地提供数据的关键是使用集体通信操作，以最大程度地减少瓶颈的影响。在这些系统中，利用单方通信变得更加重要，以避免在按照单点对点功能实现的集体操作中成对的进程之间不必要的同步。这项工作提出了一系列算法，这些算法基于使用分层树，重叠的单边通信，消息管道和可用的NUMA绑定功能，在集体操作中提供了良好的性能和可伸缩性。已经针对统一并行C（一种分区的全局地址空间语言）开发了一种实现，该实现提供了跨节点的共享内存视图以实现可编程性，同时保留专用内存区域以提高性能。在五个有代表性的系统（JuRoPA，JUDGE，Finis Terrae，SVG和Superdome）上进行的拟议实施的性能评估显示总体上良好的性能和可扩展性，甚至在某些情况下甚至优于MPI，这证实了所开发算法在以下方面的适用性： manycore体系结构。

著录项

来源
《Cluster computing》 |2014年第4期|共23页
作者
Mallon Damin A.; Taboada Guillermo L.; Teijeiro Carlos; Gonzalez-Dominguez Jorge; Gomez Andres; Wibecan Brian;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
Manycore architectures; Collective operations; NUMA; UPC; PGAS; MPI; High performance computing; Communication algorithms;

机译：Manycore体系结构;集体操作;NUMA;UPC;PGAS;MPI;高性能计算;通信算法;

相似文献

外文文献
中文文献
专利

1. Scalable PGAS collective operations in NUMA clusters [J] . Mallon Damin A., Taboada Guillermo L., Teijeiro Carlos, Cluster computing . 2014,第4期

机译：NUMA集群中的可扩展PGAS集合操作
2. Collective Communication on FPGA Clusters with Static Scheduling [J] . Jiayi Sheng, Qingqing Xiong, Chen Yang, Computer architecture news . 2016,第4期

机译：具有静态调度功能的FPGA集群上的集体通信
3. Scalable clustering and mapping algorithm for application distribution on heterogeneous and irregular FPGA clusters [J] . Lester Kalms, Diana Goehringer Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：可扩展的集群和映射算法，用于异构和不规则FPGA集群上的应用程序分发
4. CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters [C] . Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2016

机译：大规模GPU集群上基于CUDA内核的集体约简操作
5. Design and validation of a scalable Digital Wireless Channel Emulator using an FPGA computing cluster. [D] . Buscemi, Anthony Scott. 2013

机译：使用FPGA计算集群设计和验证可扩展的数字无线通道仿真器。
6. Dynein–Dynactin–NuMA clusters generate cortical spindle-pulling forces as a multi-arm ensemble [O] . Masako Okumura, Toyoaki Natsume, Masato T Kanemaki, 2015

机译：Dynein–Dynactin–NuMA簇产生多臂合奏的皮层纺锤体牵引力
7. Design of scalable PGAS collectives for NUMA and manycore systems [O] . Álvarez Mallón Damián 2014

机译：用于NUMA和manycore系统的可扩展PGAS集合的设计
8. A Technique for Improving Performance of Global Collective Operations on Cluster of SMPs [R] . Cheng, B. 2000

机译：一种提高smp集群全局集体运营绩效的技术

Scalable PGAS collective operations in NUMA clusters

摘要

著录项

相似文献

相关主题

期刊订阅