首页> 外文OA文献 >Fault Tolerance For Main-Memory Applications In The Cloud
【2h】

Fault Tolerance For Main-Memory Applications In The Cloud

机译:云中主内存应用程序的容错

摘要

Advances in hardware have enabled many long-running applications to execute entirely in main memory. With the emergence of cloud computing, thousands of machines could be made available to deploy such applications with lowered operational and maintenance costs. While achieving substantially better performance, these applications have encountered new challenges in achieving fault tolerance; i.e., to ensure durability in the event of a crash. In addition, many of these applications, such as massively multiplayer online games, main-memory OLTP systems, main-memory search engine and deterministic transaction processing systems, must sustain extremely high update rates - often hundreds of thousands of updates per second. They also demand extremely high throughput (e.g. scientific simulation) or low latency (e.g. massively multiplayer online games). To support these demanding requirements, these applications have increasingly turned to database techniques. In this dissertation, we propose an approach to provide fault tolerance for main-memory applications without introducing excessive overhead or latency spikes. First, we evaluate the applicability of existing checkpoint recovery techniques developed for main-memory DBMS. We use massively multiplayer online games (MMOs) as our motivating example. In particular, we show how to adapt consistent checkpointing techniques developed for main-memory databases to MMOs. Furthermore, we provide a thorough simulation model and evaluation of six recovery strategies. Based on our results, we argue that not all state-of-the-art checkpoint recovery techniques are equally suited for low-latency and high-throughput applications such as MMOs. These algo- rithms either use locks or large synchronous copy operations, which hurt throughput and latency, respectively. Next, we take advantage of frequent points of consistency in many of these applications to develop novel checkpoint recovery algorithms that trade additional space in main memory for significantly lower overhead and latency. Compared to previous work, our new algorithms do not require any locking or bulk copies of the application state. Our experimental evaluation shows that one of our new algorithms attains nearly constant latency and reduces overhead by more than an order of magnitude for low to medium update rates. Additionally, in a heavily loaded main-memory transaction processing system, it still reduces overhead by more than a factor of two. Finally, we present BRRL, a library for making distributed main-memory applications fault tolerant. BRRL is optimized for cloud applications with frequent points of consistency that use data-parallelism to avoid complex concurrency control mechanisms. BRRL differs from existing recovery libraries by providing a simple table abstraction and using schema information to optimize checkpointing.
机译:硬件的进步使许多长期运行的应用程序可以完全在主内存中执行。随着云计算的出现,数以千计的机器可用于部署此类应用程序,同时降低了运营和维护成本。这些应用程序在获得实质上更好的性能的同时,在实现容错能力方面也遇到了新的挑战。即为了确保在发生碰撞时的耐久性。此外,许多此类应用程序(例如大型多人在线游戏,主内存OLTP系统,主内存搜索引擎和确定性事务处理系统)必须维持极高的更新率-每秒通常数十万次更新。他们还需要极高的吞吐量(例如科学模拟)或低延迟(例如大型多人在线游戏)。为了支持这些苛刻的要求,这些应用程序越来越多地转向数据库技术。在本文中,我们提出了一种在不引入过多开销或延迟尖峰的情况下为主存储器应用程序提供容错能力的方法。首先,我们评估为主要内存DBMS开发的现有检查点恢复技术的适用性。我们以大型多人在线游戏(MMO)为例。特别是,我们展示了如何将针对主内存数据库开发的一致检查点技术应用于MMO。此外,我们提供了全面的仿真模型并评估了六种恢复策略。根据我们的结果,我们认为并非所有最新的检查点恢复技术都同样适用于低延迟和高吞吐量的应用程序,例如MMO。这些算法要么使用锁,要么使用大型同步复制操作,这分别损害了吞吐量和延迟。接下来,我们利用这些应用程序中频繁出现的一致性点,开发出新颖的检查点恢复算法,该算法可以在主内存中交换额外的空间,从而显着降低开销和延迟。与以前的工作相比,我们的新算法不需要对应用程序状态进行任何锁定或批量复制。我们的实验评估表明,对于低至中的更新速率,我们的一种新算法可实现近乎恒定的延迟,并将开销减少一个数量级以上。此外,在负载较重的主内存事务处理系统中,它仍将开销减少了两倍以上。最后,我们介绍了BRRL,这是一个用于使分布式主内存应用程序具有容错能力的库。 BRRL针对具有频繁一致性点的云应用程序进行了优化,这些应用程序使用数据并行性来避免复杂的并发控制机制。 BRRL与现有恢复库的区别在于,它提供了简单的表抽象并使用架构信息来优化检查点。

著录项

  • 作者

    Cao Tuan;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号