首页> 外文期刊>PLoS Computational Biology >Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources
【24h】

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources

机译:虚拟化环境中生物信息应用程序的性能和缩放行为,以创造计算资源有效利用的认识

获取原文
           

摘要

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.
机译:当前时间提供的大量生物数据,使得在生物信息学面积开发的基于复杂和高效的算法的基础上使用工具和应用。此外,需要访问高性能计算资源,以在合理的时间内实现结果。为了加速应用程序并利用可用的计算资源尽可能高效,软件开发人员利用并行化机制,如多线程。生物信息学中的许多可用工具提供多线程功能,但更多的计算能力并不总是有用的。在这项研究中,我们调查了生物信息学中众所周知的应用程序的行为,在缩放,不同的虚拟环境和不同数据集中的性能,我们的基准工具套件可引导。该工具套件包括BBMAP,Bowtie2,BWA,天鹅绒,IDBA,黑桃,欧米茄,Mafft,新浪和Gromacs的工具。此外,我们使用机器学习框架TensorFlow添加了应用程序。机器学习不是生物信息学的一部分,而是适用于许多生物问题,尤其是在医学图像的背景下(X射线照片)。所提到的工具已在两个不同的虚拟环境中分析,虚拟机环境基于OpenStack云软件和Docker环境。将获得的性能值与裸金属设置和彼此之间进行比较。该研究显示,与裸金属环境相比,使用的虚拟环境在七到二十五%的范围内产生开销。缩放测量显示,其中一些分析的工具不会受益于使用较大的计算资源,而其他工具则显示出几乎线性的缩放行为。这项研究的结果尽可能推广,并应帮助用户找到他们分析的最佳资源。此外,结果为资源提供商提供了有价值的信息,以便尽可能高效地处理其资源,并提高用户社区对计算资源的有效使用的认识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号