首页> 外文期刊>Algorithms for Molecular Biology >HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
【24h】

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

机译:HAlign-II:具有分布式和并行计算的高效超大多序列比对和系统树重建

获取原文
       

摘要

Background Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Methods Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. Results The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. Conclusions THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign .
机译:背景技术多序列比对(MSA)在生物学序列分析中,尤其是在系统树的构建中起着关键作用。下一代测序的极大增加导致缺乏有效的超大型生物序列比对方法来应对不同的序列类型。方法分布式并行计算是加速超大型(例如文件大于1 GB)序列分析的关键技术。基于HAlign和Spark分布式计算系统,我们实现了一种具有成本效益和时间效率的HAlign-II工具,以解决超大型多重生物序列比对和系统树的构建。结果对DNA和蛋白质的大规模数据集(超过1GB的文件)进行的实验表明,HAlign II可以节省时间和空间。它的性能优于当前的软件工具。 HAlign-II可以有效地执行MSA并构建具有超大量生物序列的系统发育树。 HAlign-II具有极高的内存效率,并且可以随着计算资源的增加而很好地扩展。结论THAlign-II提供了基于我们的分布式计算基础架构的用户友好型Web服务器。带有开源代码和数据集的HAlign-II在http://lab.malab.cn/soft/halign上建立。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号