首页> 外文会议>Conference on Priority Program Software for Exascale Computing >FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing
【24h】

FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing

机译:FFMK:一种用于百亿级计算的快速且容错的基于微内核的系统

获取原文

摘要

The FFMK project designs, builds and evaluates a system-software architecture to address the challenges expected in Exascale systems. In particular, these challenges include performance losses caused by the much larger impact of runtime variability within applications, hardware, and operating system (OS), as well as increased vulnerability to failures. The FFMK OS platform is built upon a multi-kernel architecture, which combines the L4Re microkernel and a virtualized Linux kernel into a noise-free, yet feature-rich execution environment. It further includes global, distributed platform management and system-level optimization services that transparently minimize checkpoint/restart overhead for applications. The project also researched algorithms to make collective operations fault tolerant in presence of failing nodes. In this paper, we describe the basic components, algorithms, and services we developed in Phase 2 of the project.
机译:FFMK项目设计,构建和评估系统软件体系结构,以应对Exascale系统中预期的挑战。特别是,这些挑战包括由于应用程序,硬件和操作系统(OS)内的运行时可变性的更大影响而导致的性能损失,以及对故障的增加的脆弱性。 FFMK OS平台建立在多内核体系结构上,该体系结构将L4Re微内核和虚拟化Linux内核组合到无噪声但功能丰富的执行环境中。它还包括全局,分布式平台管理和系统级优化服务,可透明地最小化应用程序的检查点/重新启动开销。该项目还研究了在出现故障节点的情况下使集体操作容错的算法。在本文中,我们描述了在项目的第二阶段中开发的基本组件,算法和服务。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号