首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer
【24h】

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer

机译:用于非并行生物信息学应用程序的并行工作流管理器,用于解决超级计算机上的大规模生物学问题

获取原文
获取原文并翻译 | 示例
           

摘要

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpi-Wrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper.
机译:在线资源的快速扩展提供了与生物大分子相关的基因组,结构和功能信息的访问权限,这归因于对大型数据集的系统分析,从而为深入了解生物过程的机制提供了机会。然而,这需要新颖的策略来最佳地利用计算机处理能力。生物信息学和分子建模中的某些方法需要大量的计算资源。其他算法具有快速实现,最多需要花费几个小时来分析现代桌面工作站上的常见输入,但是,由于对大量子任务的多次调用,整个任务需要大量的计算能力。因此,对大规模生物学问题的有效计算解决方案既需要明智地并行执行资源匮乏的方法,也需要智能的工作流来管理相对快速的算法的多次调用。在这项工作中,开发了一种新的计算机软件mpi-Wrapper,以适应并行超级计算环境中科学算法的非并行实现。消息传递接口已实现为在节点之间交换信息。在每个处理单元上调用两个专用线程(一个用于任务管理和通信,另一个用于子任务执行),以避免在使用对MPI的阻塞调用时发生死锁。 mpiWrapper可用于启动所有常规Linux应用程序,而无需修改其原始源代码,并支持在节点故障时重新提交子任务。我们展示了这种方法可以通过在超级计算机上以并行模式运行非并行程序来有效地处理大量生物数据。可从http://biokinet.belozersky.msu.ru/mpiWrapper获得C ++源代码和文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号