首页> 外文期刊>Scientific reports. >FAMSA: Fast and accurate multiple sequence alignment of huge protein families
【24h】

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

机译:Famsa:巨大的蛋白质家庭的快速准确序列对齐

获取原文
           

摘要

Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8?GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.
机译:现代排序平台的快速发展有助于蛋白质家庭数据库前所未有的增长。含有数十万个序列的大量含量对于多个序列对准算法是一种强大的挑战。本文介绍了FAMSA,这是一种新的渐进算法,设计用于数千次蛋白质序列的快速和准确对准。其特征包括利用用于确定成对相似性的最长共同的子序列,评估差距成本的新方法,以及一种新的迭代细化方案。重要的是,它的实现高度优化和并行化,以充分利用现代计算机平台。由于以上,质量指标,即对和全柱分数,显示FAMSA优于竞争算法,如集群欧米茄或用于数据集超过几千次序列的MAFFG。质量不会按时或内存要求妥协,这是比现有解决方案中的数量级。例如,在不到两个小时的时间内分析415519个序列的家族,并且需要不超过8μm的RAM。 Famsa可以免费提供http://sun.aei.polsl.pl/refresh/famsa。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号