首页> 外文期刊>Cluster computing >Simulative performance analysis of gossip failure detection for scalable distributed systems

Simulative performance analysis of gossip failure detection for scalable distributed systems


获取原文并翻译 | 示例


Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which failures can be detected in large distributed systems in an asynchronous manner without the limits associated with reliable multicasting for group communications. The hierarchical protocol leverages the underlying network topology to achieve faster failure detection. In addition to studying the effectiveness and efficiency of these two agreement protocols, we propose a third protocol that extends the hierarchical approach by piggybacking gossip information on application-generated messages. The protocols are simulated and evaluated with a fault-injection model for scalable distributed systems comprised of clusters of workstations connected by high-performance networks, such as the CPlant system at Sandia National Laboratories. The model supports permanent and transient node and link failures, with rates specified at simulation time,for processors functioning in a fail-silent fashion. Through high-fidelity, CAD-based modeling and simulation, we demonstrate the strengthe and weaknesses of each approach in terms of agreement time, number of gossips, and overall scalability.
机译:分析并比较了大型异构集群中基于八卦的故障检测服务的三种协议。基本的八卦协议提供了一种方法,通过该方法可以在大型分布式系统中以异步方式检测故障,而没有与用于组通信的可靠多播相关联的限制。分层协议利用基础网络拓扑来实现更快的故障检测。除了研究这两个协议协议的有效性和效率外,我们提出了第三个协议,该协议通过在应用程序生成的消息上附带八卦信息来扩展分层方法。使用故障注入模型对协议进行仿真和评估,以用于可扩展的分布式系统,该系统由通过高性能网络连接的工作站集群组成,例如Sandia National Laboratories的CPlant系统。该模型支持永久性和临时性节点和链接故障,并在仿真时指定速率,以使处理器以静默方式工作。通过基于CAD的高保真建模和仿真,我们在协议时间,八卦数目和总体可伸缩性方面展示了每种方法的优缺点。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号