...
首页> 外文期刊>BMC Bioinformatics >FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets
【24h】

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets

机译:FastMG:一种简单,快速且准确的最大似然方法,可从大型数据集中估算氨基酸替代率矩阵

获取原文

摘要

Background Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled. Results The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used. Conclusions FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com.
机译:背景技术氨基酸置换率矩阵是许多蛋白质分析系统的重要组成部分,例如序列相似性搜索,序列比对和系统发育推断。理想情况下,比率矩阵反映正在研究的实际数据的突变行为;然而,估计氨基酸替代率矩阵需要较大的蛋白质比对,并且计算上昂贵且复杂。作为一种折衷,次优的预先计算的通用矩阵通常用于基于蛋白质的系统发育。如果可以控制计算成本,那么序列可用性现已发展到可以计算特定问题率矩阵的程度。结果通过最大似然估计速率矩阵的最耗时的步骤是根据蛋白质比对建立最大似然系统树。我们提出了一种名为FastMG的新程序来克服这一障碍。关键的创新是比对拆分算法,该算法可在估计氨基酸替代率之前将具有许多序列的比对拆分为不重叠的子比对。使用不同的大型数据集进行的实验表明,FastMG程序要比不分割时快一个数量级。重要的是,如果使用适当的分离程序,则不会明显降低基质质量。结论FastMG是一种简单,快速,准确的方法,可从大型数据集中估算氨基酸替代率矩阵。它使研究人员能够利用优化的,数据特定的氨基酸替代率矩阵来研究特定蛋白质或类群的进化关系。该程序,数据集和新的哺乳动物线粒体蛋白速率矩阵可从http://fastmg.codeplex.com获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号