首页> 外文会议>Principles of distributed systems >A Fault Avoidance Strategy Improving the Reliability of the EGI Production Grid Infrastructure
【24h】

A Fault Avoidance Strategy Improving the Reliability of the EGI Production Grid Infrastructure

机译:一种避免故障的策略,可提高EGI生产网格基础设施的可靠性

获取原文
获取原文并翻译 | 示例

摘要

Reliability is a crucial issue for the development of stable and effective production grid infrastructures. That is, grid users must be able to trust upon the runtime service they request and receive from the underlying grid. Many runtime services and capabilities offered by modern Grid infrastructures are not available in advance to the application developers and dynamically bound only at the execution time, leading to an increased incidence of interaction faults. In this work we propose, implement and evaluate a novel low-impact fault-avoidance scheme, specifically conceived to improve the grid reliability from the user/application point of view, by providing proper service status information to the workload management system. In particular, starting from the EGEE experience, we designed a strategy inhibiting the use of some specific runtime capabilities on the available resources as soon as the monitoring system detect any anomalous behavior associated to these capabilities and re-integrating them when they restart to correctly work again. The results of a significant set of tests ran on the production EGEE infrastructure, have been presented to show the effectiveness of our approach.
机译:可靠性对于开发稳定有效的生产网格基础设施至关重要。也就是说,网格用户必须能够信任他们从底层网格请求和接收的运行时服务。现代Grid基础结构提供的许多运行时服务和功能无法提前提供给应用程序开发人员,而仅在执行时动态绑定,从而导致交互故障的发生率增加。在这项工作中,我们提出,实施和评估一种新颖的低影响故障规避方案,该方案专门为通过向工作量管理系统提供适当的服务状态信息而从用户/应用程序的角度提高电网可靠性而设计。特别是,从EGEE经验开始,我们设计了一种策略,一旦监视系统检测到与这些功能相关的任何异常行为,便禁止在可用资源上使用某些特定的运行时功能,并在它们重新启动以正常工作时重新整合它们再次。已经提出了在生产EGEE基础架构上进行的一系列重要测试的结果,以证明我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号