FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Sebastian Deorowicz; Agnieszka Debudaj-Grabysz; Adam Gudy?

首页> 外文期刊>Scientific reports. >FAMSA: Fast and accurate multiple sequence alignment of huge protein families

【24h】

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

机译：Famsa：巨大的蛋白质家庭的快速准确序列对齐

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8?GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa.

机译：现代排序平台的快速发展有助于蛋白质家庭数据库前所未有的增长。含有数十万个序列的大量含量对于多个序列对准算法是一种强大的挑战。本文介绍了FAMSA，这是一种新的渐进算法，设计用于数千次蛋白质序列的快速和准确对准。其特征包括利用用于确定成对相似性的最长共同的子序列，评估差距成本的新方法，以及一种新的迭代细化方案。重要的是，它的实现高度优化和并行化，以充分利用现代计算机平台。由于以上，质量指标，即对和全柱分数，显示FAMSA优于竞争算法，如集群欧米茄或用于数据集超过几千次序列的MAFFG。质量不会按时或内存要求妥协，这是比现有解决方案中的数量级。例如，在不到两个小时的时间内分析415519个序列的家族，并且需要不超过8μm的RAM。 Famsa可以免费提供http://sun.aei.polsl.pl/refresh/famsa。

著录项

来源
《Scientific reports.》 |2016年第1期|共页
作者
Sebastian Deorowicz; Agnieszka Debudaj-Grabysz; Adam Gudy?;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自然科学总论;
关键词

相似文献

外文文献
中文文献
专利

1. An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. [J] . Yang AS, Honig B Journal of Molecular Biology . 2000,第3期

机译：蛋白质序列和结构分析和建模的综合方法。三，使用多个结构比对的蛋白质结构家族中的序列保守性的比较研究。
2. Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments [J] . Andrew F Neuwald, Christopher J Lanczycki, Theresa K Hodges, Database . 2020,第1期

机译：获得极大且精确的蛋白质多个序列比对从策划的分层对齐进行
3. NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks [J] . Hu Jialu, Kehr Birte, Reinert Knut Bioinformatics . 2014,第4期

机译：NetCoffee：一种快速准确的全局比对方法，可识别多个网络中功能上保守的蛋白质
4. Fast and Accurate Alignment of Multiple Protein Networks [C] . Maxim Kalaev, Vineet Bafna, Roded Sharan Research in Computational Molecular Biology . 2008

机译：多种蛋白质网络的快速准确对齐
5. New multiple sequence alignment approach reveals proteins structural determinants. [D] . Baino, Khaled A. 2010

机译：新的多序列比对方法揭示了蛋白质的结构决定因素。
6. FAMSA: Fast and accurate multiple sequence alignment of huge protein families [O] . Sebastian Deorowicz, Agnieszka Debudaj-Grabysz, Adam Gudyś -1

机译：FAMSA：巨大蛋白质家族的快速准确的多序列比对
7. Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments [O] . Andrew F Neuwald, Christopher J Lanczycki, Theresa K Hodges, 2020

机译：获得极大且精确的蛋白质多个序列比对从策划的分层对齐进行

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

摘要

著录项

相似文献

相关主题

期刊订阅