A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters

机译：Xeon Phi群集上节点内和节点间卸载的统一编程模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Standard offload programming models for the Xeon Phi, e.g. Intel LEO and OpenMP 4.0, are restricted to a single compute node and hence a limited number of coprocessors. Scaling applications across a Xeon Phi cluster/supercomputer thus requires hybrid programming approaches, usually MPI+X. In this work, we present a framework based on heterogeneous active messages (HAM-Offload) that provides the means to offload work to local and remote (co)processors using a unified offload API. Since HAM-Offload provides similar primitives as current local offload frameworks, existing applications can be easily ported to overcome the single-node limitation while keeping the convenient offload programming model. We demonstrate the effectiveness of the framework by using it to enable a real-world application from the field of molecular dynamics to use multiple local and remote Xeon Phis. The evaluation shows good scaling behavior. Compared with LEO, performance is equal for large offloads and significantly better for small offloads.

机译：Xeon Phi的标准卸载编程模型，例如， Intel Leo和OpenMP 4.0仅限于单个计算节点，从而限制了有限数量的协处理器。围绕Xeon Phi群集/超级计算机的缩放应用需要混合编程方法，通常是MPI + x。在这项工作中，我们介绍了一种基于异构活动消息（HAM-OFFLOAD）的框架，该框架使用统一的卸载API提供对本地和远程（CO）处理器的卸载工作的手段。由于HAM-OFFLOAD提供了类似的基元作为当前的本地卸载框架，因此可以轻松地移植现有的应用程序来克服单节点限制，同时保持方便的卸载编程模型。我们通过使用它来展示框架的有效性，以使来自分子动力学领域的现实世界应用来使用多个本地和远程Xeon Phis。评估显示出良好的缩放行为。与Leo相比，性能相同，对于大型卸载，并且对于小型卸载而言显着更好。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|203-214|共12页
会议地点
作者
Noack Marko; Wende Florian; Steinke Thomas; Cordes Frank;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
application program interfaces; coprocessors; message passing; parallel machines; HAM-offload; Intel LEO; MPI+X; OpenMP 4.0; Xeon Phi clusters; compute node; coprocessors; heterogeneous active messages; hybrid programming approaches; inter-node offloading; intra-node offloading; molecular dynamics; scaling applications; scaling behavior; standard offload programming models; supercomputer; unified offload API; unified programming model; Computational modeling; Coprocessors; Data transfer; Libraries; Low earth orbit satellites; Performance evaluation; Programming;

机译：应用程序接口;协处理器;消息传递;并行机; HAM卸载; Intel LEO; MPI + X; OpenMP 4.0; Xeon Phi群集;计算节点;协处理器;异构活动消息;混合编程方法;节点间卸载;内部节点卸载;分子动力学;缩放应用;缩放行为;标准卸载编程模型;超级计算机;统一卸载API;统一编程模型;计算建模;协处理器;数据传输;库;低地球轨道卫星;性能评估;编程;

相似文献

外文文献
中文文献
专利

1. A Performance Study of a Dual Xeon-Phi Cluster for the Forward Modelling of Gravitational Fields [J] . MaricelaArroyo, CarlosCouder-Casta?eda, AlfredoTrujillo-Alcantara, Scientific programming . 2015,第4期

机译：引力场正演模拟的双至强-皮团性能研究
2. A Performance Study of a Dual Xeon-Phi Cluster for the Forward Modelling of Gravitational Fields [J] . Arroyo Maricela, Couder-Castaneda Carlos, Trujillo-Alcantara Alfredo, Scientific programming . 2015,第期

机译：引力场正演模拟的双至强-发Cluster群集的性能研究
3. OpenCL as a Unified Programming Model for Heterogeneous CPU/GPU Clusters [J] . Jungwon Kim, Sangmin Seo, Jun Lee, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第8期

机译：OpenCL作为异构CPU / GPU集群的统一编程模型
4. Comparison of Three Popular Parallel Programming Models on the Intel Xeon Phi [C] . Ashkan Tousimojarad, Wim Vanderbauwhede International conference on Euro-Par;International workshop on reproducibility in parallel computing;Workshop on runtime and operating systems for the many - core era;Workshop on software for exascale computing - project workshop;Workshop on techniques and applications for sustainable ultrascale computing systems;Workshop on unconventional high-performance computing;Workshop on high-performance bioinformatics and biomedicine;Workshop on virtualization in high-performance cloud computing;International workshop on multi-/many-core computing systems;Workshop on large-scale distributed virtual environments on clouds and P2P;Workshop on parallel and distributed agent-based simulations;Workshop on on-chip memory hierarchies and interconnects:organization, management andimplementation;Workshop on resiliency in high-performance computing with clouds, grids, and clusters;Workshop on productivity and performance - tools for HPC application development;Workshop on applications of parallel computation in industry and engineering;Workshop on big data management in clouds;Workshop on dependability and interoperable in heterogeneous clouds;Workshop on federative and interoperable cloud infrastructures;International workshop on algorithms, models and tools for parallel computing on heterogeneous platforms . 2014

机译：英特尔至强融核上三种流行的并行编程模型的比较
5. An Analysis of Variation Between Cores for Intel Xeon Phi Knights Corner and Xeon Phi Knights Landing. [D] . Robinson, Jamar. 2017

机译：英特尔至强披披骑士角和至强披披骑士登陆的内核之间的差异分析。
6. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters [O] . Haidong Lan, Yuandong Chan, Kai Xu, 2016

机译：基于Xeon-Phi簇的大规模生物序列比对的并行算法
7. Modeling performance and energy for applications offloaded to Intel Xeon Phi [O] . Gary Lawson, Vaibhav Sundriyal, Masha Sosonkina, 2015

机译：对Intel Xeon Phi卸载的应用的建模性能和能量

A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters

摘要

著录项

相似文献

相关主题

期刊订阅