...
首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Efficient Algorithms for Global Snapshots in Large Distributed Systems
【24h】

Efficient Algorithms for Global Snapshots in Large Distributed Systems

机译:大型分布式系统中全局快照的高效算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Existing algorithms for global snapshots in distributed systems are not scalable when the underlying topology is complete. There are primarily two classes of existing algorithms for computing a global snapshot. Algorithms in the first class use control messages of size 0(1) but require O(N) space and O(N) messages per processor in a network with JV processors. Algorithms in the second class use control messages (such as rotating tokens with vector counter method) of size O(N), use multiple control messages per channel, or require recording of message history. As a result, algorithms in both of these classes are not efficient in large systems when the logical topology of the communication layer such as MPI is complete. In this paper, we propose three scalable algorithms for global snapshots: a grid-based, a tree-based, and a centralized algorithm. The grid-based algorithm uses O(N) space but only O(¿(N)) messages per processor each of size O(¿(N)). The tree-based and centralized algorithms use only O(1) size messages. The tree-based algorithm requires O(1) space and O(log N log(W/N)) messages per processor where W is the total number of messages in transit. The centralized algorithm requires O(1) space and O(log(W/N)) messages per processor. We also have a matching lower bound for this problem. We also present hybrid of centralized and tree-based algorithms that allow trade-off between the decentralization and the message complexity. Our algorithms have applications in checkpointing, detecting stable predicates, and implementing synchronizers.
机译:当基础拓扑完成时,分布式系统中用于全局快照的现有算法无法伸缩。用于计算全局快照的现有算法主要有两类。第一类算法使用大小为0(1)的控制消息,但在具有JV处理器的网络中,每个处理器需要O(N)空间和O(N)消息。第二类算法使用大小为O(N)的控制消息(例如使用矢量计数器方法旋转令牌),每个通道使用多个控制消息或需要记录消息历史记录。结果,当完成诸如MPI之类的通信层的逻辑拓扑时,这两种类中的算法在大型系统中效率都不高。在本文中,我们为全局快照提出了三种可伸缩算法:基于网格的,基于树的和集中式算法。基于网格的算法使用O(N)空间,但每个处理器每个大小O(¿(N))仅O(ƒÂ¢Â¿(N))消息。基于树的集中式算法仅使用O(1)大小的消息。基于树的算法需要每个处理器O(1)空间和O(log N log(W / N))条消息,其中W是传输中的消息总数。集中式算法每个处理器需要O(1)空间和O(log(W / N))消息。对于此问题,我们也有一个匹配的下限。我们还提出了集中式和基于树的算法的混合,允许在分散性和消息复杂性之间进行权衡。我们的算法可用于检查点,检测稳定谓词和实现同步器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号