首页> 外文学位 >Checkpointing a multithreaded distributed shared memory computer system.

【24h】

Checkpointing a multithreaded distributed shared memory computer system.

机译：检查点多线程分布式共享内存计算机系统。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributing a program over a cluster of commodity processors connected by a commodity network can help speed up a computation for a relatively low cost. Distributed cluster computing is especially useful for long-running scientific applications. As the number of processors and running time of program increase, however, the probability of that one of the system's components will fail before the program ends increases. A program can prepare for failures by periodically saving its state in a checkpoint from which it can be recovered later.; Checkpointing distributed programs requires making sure the checkpoints that individual processes save can be used together to restore a consistent state. Programs using a coordinated checkpointing algorithm communicate to save a consistent state. Programs using a communication-induced checkpointing algorithm build a consistent state without explicit communication. Although communication induced checkpointing algorithms have less communication overhead they do not add significantly less overhead to programs because synchronization overhead is small compared to the amount of time required to save a checkpoint to disk.; A checkpointing system builds consistent global checkpoints from checkpoints of individual processes. Each Unify process has multiple threads, but no checkpointing library existed that could checkpoint multi-threaded programs at the start of this research. This research includes the development of a checkpointing library to checkpoint multithreaded processes on Solaris 2.5 and Linux. The checkpointing library can be used as a standalone checkpointing library for multithreaded processes in addition to being used by Unify.

机译：在由商品网络连接的商品处理器集群上分发程序可以以相对较低的成本帮助加快计算速度。分布式集群计算对于长期运行的科学应用程序特别有用。但是，随着处理器数量的增加和程序运行时间的增加，在程序结束之前系统组件之一发生故障的可能性也随之增加。程序可以通过定期将其状态保存在检查点中来为失败做准备，以便以后可以从中恢复它。对分布式程序进行检查点检查需要确保各个进程保存的检查点可以一起使用以恢复一致状态。使用协调检查点算法的程序进行通信以保存一致的状态。使用通信引发的检查点算法的程序会建立一致的状态，而无需进行显式通信。尽管通信引起的检查点算法具有较少的通信开销，但它们却不会显着减少程序的开销，因为与将检查点保存到磁盘所需的时间相比，同步开销很小。检查点系统从各个流程的检查点构建一致的全局检查点。每个Unify进程都有多个线程，但是在本研究开始时，不存在可以对多线程程序进行检查的检查点库。这项研究包括开发检查点库，以在Solaris 2.5和Linux上检查点多线程进程。除了由Unify使用之外，检查点库还可以用作多线程进程的独立检查点库。

著录项

作者
Dieter, William Robert.;
展开▼
作者单位

University of Kentucky.;

展开▼
授予单位 University of Kentucky.;
学科 Computer Science.; Engineering Electronics and Electrical.
学位 Ph.D.
年度 2001
页码 109 p.
总页数 109
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Distributed Shared Arrays: Portable Shared-Memory Programming Interface for Multiple Computer Systems [J] . Akira Nomoto, Yasuo Watanabe, Wataru Kaneko, Cluster computing . 2004,第1期

机译：分布式共享阵列：用于多计算机系统的便携式共享内存编程接口
2. A New Co-Ordinated Checkpointing and Rollback Recovery Scheme for Distributed Shared Memory Clusters [J] . Minakshi Tripathy, C.R. Tripathy International Journal of Distributed and Parallel Systems . 2011,第1期

机译：分布式共享内存群集的新的协调统一的检查点和回滚恢复方案
3. A checkpointing algorithm for an SCI based distributed shared memory system [J] . S. Kalaiselvi, V. Rajaraman Microprocessors and Microsystems . 1999,第9期

机译：基于SCI的分布式共享内存系统的检查点算法
4. Multithreaded self-scheduling: application of multithreading on loop scheduling for distributed shared memory multiprocessor [C] . Hung, K.P., Yung, . 2001

机译：多线程自调度：多线程在分布式共享内存多处理器的循环调度中的应用
5. Application of distributed shared memory to metadata storage in a parallel file system. [D] . Wolinski, Pawel D. 2005

机译：分布式共享内存在并行文件系统中的元数据存储中的应用。
6. Performance of parallel FDTD method for shared- and distributed-memory architectures: Application tobioelectromagnetics [O] . Miguel Ruiz-Cabello N., Maksims Abaļenkovs, Luis M. Diaz Angulo, 2020

机译：共享和分布式内存架构并行FDTD方法的性能：应用脚踏电磁
7. Multithreaded self-scheduling: application of multithreading on loop scheduling for distributed shared memory multiprocessor [O] . Hung KP, Cheung YS, Yung NHC 1995

机译：多线程自调度：多线程在分布式共享内存多处理器循环调度中的应用
8. Evaluation of Multithreading and Caching in Large Shared Memory Parallel Computers. [R] . Boothe, R. F. 1993

机译：大型共享内存并行计算机中多线程和缓存的评估。

Checkpointing a multithreaded distributed shared memory computer system.

摘要

著录项

相似文献

相关主题

期刊订阅