首页> 外文会议>International Workshop on Data-Intensive Scalable Computing Systems >Efficient, Failure Resilient Transactions for Parallel and Distributed Computing
【24h】

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing

机译:并行和分布式计算的高效,故障恢复事务

获取原文
获取原文并翻译 | 示例

摘要

Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed increases. The challenges presented by this shift to Integrated Application Workflows are motivated by the loss of persistent storage semantics for node-to-node communication. One step towards addressing this semantics gap is using transactions to logically delineate a data set from 100,000s of processes to 1000s of servers as an atomic unit. Our previously demonstrated Doubly Distributed Transactions (DT) protocol showed a high-performance solution, but had not explored how to detect and recover from faults. Instead, the focus was on demonstrating high-performance typical case performance. The research presented here addresses fault detection and recovery based on the enhanced protocol design. The total overhead for a full transaction with multiple operations at 65,536 processes is on average 0.055 seconds. Fault detection and recovery mechanisms demonstrate similar performance to the success case with only the addition of appropriate timeouts for the system. This paper explores the challenges in designing a recoverable protocol for doubly distributed transactions, particularly for parallel computing environments.
机译:科学模拟正在从在工作流步骤之间使用集中式持久性存储中间数据过渡到全在线模型。与计算速度增加相比,此变化是由相对较慢的IO带宽增长引起的。转移到集成应用程序工作流所带来的挑战是由于失去了节点到节点通信的持久性存储语义而引起的。解决此语义鸿沟的一个步骤是使用事务在逻辑上将一个原子组的数据集从100,000个进程描绘到1000个服务器。我们之前演示的双分布式事务(DT)协议显示了一种高性能的解决方案,但尚未探讨如何检测故障并从故障中恢复。相反,重点在于演示高性能典型案例性能。本文介绍的研究基于增强的协议设计解决了故障检测和恢复问题。具有65,536个进程的多个操作的完整事务的平均总开销平均为0.055秒。故障检测和恢复机制仅通过为系统增加适当的超时来证明与成功案例具有相似的性能。本文探讨了为双分布事务(尤其是并行计算环境)设计可恢复协议时面临的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号