首页> 外文学位 >An architecture for checkpointing and migration of distributed components on the grid.
【24h】

An architecture for checkpointing and migration of distributed components on the grid.

机译:用于在网格上检查点和迁移分布式组件的体系结构。

获取原文
获取原文并翻译 | 示例

摘要

A computational Grid is a set of hardware and software resources that provide seamless, dependable, and pervasive access to high-end computational capabilities. The Grid differs from other computational resources such as traditional supercomputers and clusters by the following key features: (1) coordination of resources that are not subject to centralized control, (2) use of standard, open, general purpose protocols and interfaces, and (3) delivery of non-trivial qualities of service despite unpredictable resource availabilities.; The Open Grid Services Architecture (OGSA) is the first effort to standardize Grid functionality, based on concepts from the NVeb services community. However, the Web services based OGSA presents a server-centric approach which is not very conducive to the orchestration of complex distributed applications where the interactions are not always of the client-server type. We present a distributed component based approach for composing complex applications on the Grid that is conformant with the Common Component Architecture (CCA), while maintaining compatibility with Grid standards.; Because Grid resources are not subject to centralized control and are geographically distributed, their availabilities may be very dynamic in nature. Migration of individual components can be an effective strategy for dealing with dynamic resource availabilities. However, migration of components that are part of a distributed application is complicated due to the possible interactions between them during execution. We present an approach for migration of distributed components, in the presence of communication between them. Additionally, reliability of Grid resources is also very difficult to guarantee. Checkpointing applications and rolling back to a saved state is an effective form of fault tolerance for dealing with failures of such resources. However, due to the distributed nature of the applications, the checkpoints generated need to be globally consistent. We present our approach for check-pointing and restart of distributed components for fault tolerance purposes.
机译:计算网格是一组硬件和软件资源,它们提供对高端计算功能的无缝,可靠和普遍的访问。网格与其他计算资源(例如传统的超级计算机和集群)的不同之处在于以下主要特征:(1)协调不受集中控制的资源;(2)使用标准,开放,通用协议和接口;以及( 3)尽管有不可预测的资源可用性,但仍提供非平凡的服务质量;开放网格服务体系结构(OGSA)是基于NVeb服务社区的概念对网格功能进行标准化的首次尝试。但是,基于Web服务的OGSA提出了一种以服务器为中心的方法,这对协调并非总是客户端-服务器类型的复杂分布式应用程序非常不利。我们提出了一种基于分布式组件的方法,该方法用于在网格上组成符合通用组件体系结构(CCA)的复杂应用程序,同时保持与网格标准的兼容性。由于网格资源不受集中控制,并且在地理上分布,因此其可用性本质上可能是非常动态的。迁移各个组件可能是处理动态资源可用性的有效策略。但是,由于在执行期间组件之间可能发生交互,因此迁移属于分布式应用程序的组件非常复杂。我们提出了一种在组件之间存在通信的情况下迁移分布式组件的方法。另外,网格资源的可靠性也很难得到保证。检查点应用程序并回滚到保存状态是处理此类资源故障的容错的有效形式。但是,由于应用程序的分布式性质,生成的检查点需要全局一致。我们提出了用于容错目的的分布式组件检查点和重新启动的方法。

著录项

  • 作者

    Krishnan, Sriram.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 147 p.
  • 总页数 147
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:43:39

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号