首页> 外文会议>2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops >A Resilient Framework for Iterative Linear Algebra Applications in X10
【24h】

A Resilient Framework for Iterative Linear Algebra Applications in X10

机译:X10中迭代线性代数应用程序的弹性框架

获取原文
获取原文并翻译 | 示例

摘要

The Global Matrix Library (GML) is a distributed matrix library in the X10 language. GML is designed to simplify the development of scalable linear algebra applications. By hiding the communication and parallelism details, GML programs are written in a sequential style that is easy to use and understand by non expert programmers. Resilience is becoming a major challenge for HPC applications as the number of components in a typical system continues to increase. To address this challenge, we improved GML's adaptability to process failure and provided a mechanism for automatic data recovery. As iterative algorithms are commonly used in linear algebra applications, we also created a checkpoint/restore framework for developing resilient iterative applications using GML. Using three example machine learning applications, we demonstrate that this framework supports resilient application development with minimal additional code compared to a non-resilient implementation. Performance measurements in a typical cluster environment show that the major cost of resilient execution is due to resilient X10 itself, and that the additional cost due to our framework is acceptable.
机译:全局矩阵库(GML)是X10语言的分布式矩阵库。 GML旨在简化可扩展线性代数应用程序的开发。通过隐藏通信和并行性细节,GML程序以一种顺序样式编写,该样式易于被非专业程序员使用和理解。随着典型系统中组件的数量不断增加,弹性已成为HPC应用程序的主要挑战。为了应对这一挑战,我们提高了GML对流程故障的适应性,并提供了自动数据恢复的机制。由于迭代算法通常用于线性代数应用程序中,因此我们还创建了一个检查点/恢复框架,以使用GML开发弹性迭代应用程序。通过使用三个示例机器学习应用程序,我们证明了与非弹性实现相比,该框架以最少的附加代码支持弹性应用程序开发。在典型的集群环境中的性能测量表明,弹性执行的主要成本是由于弹性X10本身造成的,并且由于我们的框架而产生的额外成本是可以接受的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号