首页> 外文学位 >Fault-tolerant techniques for high performance computing and a bioinformatics application.

【24h】

Fault-tolerant techniques for high performance computing and a bioinformatics application.

机译：高性能计算和生物信息学应用程序的容错技术。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computational clusters have long provided a mechanism for the acceleration of high performance computing (HPC) applications. As today's supercomputers approach the petaflop scale, however, they are also exhibiting an increase in heterogeneity. This heterogeneity spans a range of technologies, from multiple operating systems to hardware accelerators and novel architectures. Because of the exceptional acceleration some of these heterogeneous architectures provide, they are being embraced as viable tools for HPC applications, particularly in the area of biological sequence analysis.; In this dissertation we study two of these challenges in detail. We begin with the HMMER sequence analysis suite. It uses a readily parallelizable algorithm based on profile hidden Markov models. However, to date HMMER has seen only limited use in the HPC setting due to its reliance on PVM for parallelization. We develop a more scalable distributed implementation of HMMER, called MPI-HMMER and extend it to include the use of multiple FPGAs for greater acceleration.; The heterogeneous aspect of the acceleration brings to the forefront the second challenge studied in this dissertation: fault-tolerance and checkpointing for HPC systems. To address the challenges of HPC checkpointing, we develop a fault-tolerant MPI based on LAM/MPI with asynchronous replication along with checkpoint migration, eliminating the need for central or network storage and allowing for reconfigurable MPI topologies in the event of node failure. We evaluate centralized storage, SAN-based solutions, and a commercial parallel file system-based solution and show that they are not scalable. As a result, we show that our replication-based checkpointing/migration system is uniquely capable of handling the large amount of data generated by a supercomputing application's checkpoint.; As a first step towards supporting the checkpointing of heterogeneous systems, we then explore the idea of using virtualization for high performance computing. Using OpenVZ, we demonstrate that the checkpointing of virtualized computational clusters is indeed feasible with relatively low overhead. By adapting the idea of checkpoint replication to the virtual environment, we eliminate any need for network storage or centralized servers, and reduce the impact of checkpointing on non-participating cluster nodes and users.

机译：计算集群长期以来为加速高性能计算（HPC）应用程序提供了一种机制。但是，随着当今的超级计算机接近petaflop规模，它们的异构性也在增加。这种异质性涵盖了多种技术，从多个操作系统到硬件加速器和新颖的体系结构。由于这些异质架构中的某些提供了超凡的加速，它们被视为适用于HPC应用的可行工具，特别是在生物序列分析领域。在本文中，我们详细研究了其中两个挑战。我们从HMMER序列分析套件开始。它使用基于轮廓隐式马尔可夫模型的易于并行化的算法。但是，到目前为止，由于HMMER依赖于PVM进行并行化，因此在HPC设置中仅使用有限。我们开发了一种更具扩展性的HMMER分布式实现，称为MPI-HMMER，并将其扩展为包括使用多个FPGA来实现更大的加速。加速度的异质性将本文研究的第二个挑战带到了最前沿：HPC系统的容错和检查点。为了解决HPC检查点的挑战，我们开发了基于LAM / MPI的容错MPI，具有异步复制和检查点迁移功能，消除了对中央或网络存储的需求，并允许在发生节点故障时重新配置MPI拓扑。我们评估了集中存储，基于SAN的解决方案和基于商业并行文件系统的解决方案，并证明它们不可扩展。结果，我们证明了基于复制的检查点/迁移系统具有独特的能力，能够处理由超级计算应用程序的检查点生成的大量数据。作为支持异构系统检查点的第一步，我们然后探讨了将虚拟化用于高性能计算的想法。使用OpenVZ，我们证明了虚拟化计算集群的检查点确实是可行的，而且开销相对较低。通过使检查点复制的思想适应虚拟环境，我们消除了对网络存储或集中式服务器的任何需求，并减少了检查点对非参与群集节点和用户的影响。

著录项

作者
Walters, John Paul N.;
展开▼
作者单位

Wayne State University.$bComputer Science.;

展开▼
授予单位 Wayne State University.$bComputer Science.;
学科 Biology Bioinformatics.; Computer Science.
学位 Ph.D.
年度 2007
页码 174 p.
总页数 174
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A framework for ABFT techniques in the design of fault-tolerant computing systems [J] . Hamidi Hodjat, Vafaei Abbas, Monadjemi Seyed EURASIP journal on advances in signal processing . 2011,第20aPta3期

机译：容错计算系统设计中的ABFT技术框架
2. A framework for ABFT techniques in the design of fault-tolerant computing systems [J] . Hodjat Hamidi, Abbas Vafaei, Seyed Amirhassan Monadjemi EURASIP journal on advances in signal processing . 2011,第1期

机译：容错计算系统设计中的ABFT技术框架
3. Parallel computing in bioinformatics: a view from high-performance, heterogeneous, and cloud computing [J] . Vega-Rodriguez Miguel A., Santander-Jimenez Sergio Journal of supercomputing . 2019,第7期

机译：生物信息学中的并行计算：高性能，异构和云计算的视角
4. Private Cloud Computing Techniques for Inter-processing Bioinformatics Tools [C] . Tae-Kyung Kim, Bo Kyeng Hou, Wan-Sup Cho Convergence and hybrid information technology . 2011

机译：互处理生物信息学工具的私有云计算技术
5. Enhancements to reconstruction techniques in computed tomography using high performance computing. [D] . Eliuk, Steven Nicholas. 2012

机译：使用高性能计算的计算机断层扫描中重建技术的增强。
6. High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics [O] . Mary Qu Yang, Brian D Athey, Hamid R Arabnia, 2009

机译：高通量下一代测序技术促进了生物信息学领域的新前沿计算技术
7. A framework for ABFT techniques in the design of fault-tolerant computing systems [O] . Hodjat Hamidi, Abbas Vafaei, Seyed Amirhassan Monadjemi 2011

机译：容错计算系统设计中的ABFT技术框架

Fault-tolerant techniques for high performance computing and a bioinformatics application.

摘要

著录项

相似文献

相关主题

期刊订阅