Efficient Algorithms for Global Snapshots in Large Distributed Systems

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Efficient Algorithms for Global Snapshots in Large Distributed Systems

【24h】

Efficient Algorithms for Global Snapshots in Large Distributed Systems

机译：大型分布式系统中全局快照的高效算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing algorithms for global snapshots in distributed systems are not scalable when the underlying topology is complete. There are primarily two classes of existing algorithms for computing a global snapshot. Algorithms in the first class use control messages of size 0(1) but require O(N) space and O(N) messages per processor in a network with JV processors. Algorithms in the second class use control messages (such as rotating tokens with vector counter method) of size O(N), use multiple control messages per channel, or require recording of message history. As a result, algorithms in both of these classes are not efficient in large systems when the logical topology of the communication layer such as MPI is complete. In this paper, we propose three scalable algorithms for global snapshots: a grid-based, a tree-based, and a centralized algorithm. The grid-based algorithm uses O(N) space but only O(ÃÂÃÂ¿(N)) messages per processor each of size O(ÃÂÃÂ¿(N)). The tree-based and centralized algorithms use only O(1) size messages. The tree-based algorithm requires O(1) space and O(log N log(W/N)) messages per processor where W is the total number of messages in transit. The centralized algorithm requires O(1) space and O(log(W/N)) messages per processor. We also have a matching lower bound for this problem. We also present hybrid of centralized and tree-based algorithms that allow trade-off between the decentralization and the message complexity. Our algorithms have applications in checkpointing, detecting stable predicates, and implementing synchronizers.

机译：当基础拓扑完成时，分布式系统中用于全局快照的现有算法无法伸缩。用于计算全局快照的现有算法主要有两类。第一类算法使用大小为0（1）的控制消息，但在具有JV处理器的网络中，每个处理器需要O（N）空间和O（N）消息。第二类算法使用大小为O（N）的控制消息（例如使用矢量计数器方法旋转令牌），每个通道使用多个控制消息或需要记录消息历史记录。结果，当完成诸如MPI之类的通信层的逻辑拓扑时，这两种类中的算法在大型系统中效率都不高。在本文中，我们为全局快照提出了三种可伸缩算法：基于网格的，基于树的和集中式算法。基于网格的算法使用O（N）空间，但每个处理器每个大小O（ÃƒÂ‚Â¿（N））仅O（ƒÂ¢Â¿（N））消息。基于树的集中式算法仅使用O（1）大小的消息。基于树的算法需要每个处理器O（1）空间和O（log N log（W / N））条消息，其中W是传输中的消息总数。集中式算法每个处理器需要O（1）空间和O（log（W / N））消息。对于此问题，我们也有一个匹配的下限。我们还提出了集中式和基于树的算法的混合，允许在分散性和消息复杂性之间进行权衡。我们的算法可用于检查点，检测稳定谓词和实现同步器。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2010年第5期|p.620-630|共11页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Checkpointing; global snapshots; stable predicates.;

机译：检查点;全局快照;稳定谓词。;

相似文献

外文文献
中文文献
专利

1. Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems [J] . Kshemkalyani Ajay D. Parallel and Distributed Systems, IEEE Transactions on . 2010,第9期

机译：大型分布式系统的快速且消息有效的全局快照算法
2. Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems [J] . Tsai Jichiang Parallel and Distributed Systems, IEEE Transactions on . 2013,第3期

机译：大规模分布式系统的灵活对称全局快照算法
3. New global optimization algorithms based on multi-loop distributed control systems with serial structure and ring structure for solving global optimization problems [J] . Lu Baiquan, Zheng Zhongwei, Zhang Zhijun, Engineering Applications of Artificial Intelligence . 2021,第May期

机译：基于多环分布式控制系统的新全局优化算法，具有串行结构和环形结构，用于解决全球优化问题
4. Independent global snapshots in large distributed systems [C] . Sreenivas M.V., Bhalla S. High-Performance Computing, 1997. Proceedings. Fourth International Conference on . 1997

机译：大型分布式系统中的独立全局快照
5. Performance evaluation of distributed algorithms for verification of global predicates in distributed systems [D] . Joo, Jang. 1996

机译：用于验证分布式系统中全局谓词的分布式算法的性能评估
6. Information theory inspired optimization algorithm for efficient service orchestration in distributed systems [O] . Matheus Sant’Ana Lima 2021

机译：信息理论在分布式系统中有效服务编排的启发优化算法
7. Efficient algorithms for distributed snapshots and global virtual time approximation [O] . Friedemann Mattern 1993

机译：用于分布式快照和全局虚拟时间近似的高效算法

Efficient Algorithms for Global Snapshots in Large Distributed Systems

摘要

著录项

相似文献

相关主题

期刊订阅