首页> 外文期刊>Molecular Biology and Evolution >An Improved General Amino Acid Replacement Matrix
【24h】

An Improved General Amino Acid Replacement Matrix

机译:改进的通用氨基酸替代矩阵

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Amino acid replacement matrices are an essential basis of protein phylogenetics. They are used to compute substitution probabilities along phylogeny branches and thus the likelihood of the data. They are also essential in protein alignment. A number of replacement matrices and methods to estimate these matrices from protein alignments have been proposed since the seminal work of Dayhoff et al. (1972). An important advance was achieved by Whelan and Goldman (2001) and their WAG matrix, thanks to an efficient maximum likelihood estimation approach that accounts for the phylogenies of sequences within each training alignment. We further refine this method by incorporating the variability of evolutionary rates across sites in the matrix estimation and using a much larger and diverse database than BRKALN, which was used to estimate WAG. To estimate our new matrix (called LG after the authors), we use an adaptation of the XRATE software and 3,912 alignments from Pfam, comprising ~50,000 sequences and ~6.5 million residues overall. To evaluate the LG performance, we use an independent sample consisting of 59 alignments from TreeBase and randomly divide Pfam alignments into 3,412 training and 500 test alignments. The comparison with WAG and JTT shows a clear likelihood improvement. With TreeBase, we find that 1) the average Akaike information criterion gain per site is 0.25 and 0.42, when compared with WAG and JTT, respectively; 2) LG is significantly better than WAG for 38 alignments (among 59), and significantly worse with 2 alignments only; and 3) tree topologies inferred with LG, WAG, and JTT frequently differ, indicating that using LG impacts not only the likelihood value but also the output tree. Results with the test alignments from Pfam are analogous. LG and a PHYML implementation can be downloaded from http://atgc.lirmm.fr/LG.
机译:氨基酸替代基质是蛋白质系统发育学的重要基础。它们用于计算沿系统发育分支的替换概率,从而计算数据的可能性。它们在蛋白质比对中也必不可少。自Dayhoff等人的开创性工作以来,已经提出了许多替代矩阵和从蛋白质比对评估这些矩阵的方法。 (1972)。 Whelan和Goldman(2001)及其WAG矩阵取得了重要进展,这要归功于有效的最大似然估计方法,该方法考虑了每个训练比对中序列的系统发育。我们通过在矩阵估计中并入各个站点的演化速率的可变性,并使用比用于估计WAG的BRKALN大得多且种类繁多的数据库,进一步完善该方法。为了评估我们的新矩阵(作者后称为LG),我们使用了XRATE软件的改编和Pfam的3,912个比对,总共约50,000个序列和650万个残基。为了评估LG的性能,我们使用了一个独立的样本,该样本由TreeBase的59个序列组成,并将Pfam序列随机分为3,412个训练序列和500个测试序列。与WAG和JTT的比较显示出明显的似然性改善。使用TreeBase,我们发现1)与WAG和JTT相比,每个站点的平均Akaike信息标准增益分别为0.25和0.42; 2)LG在38个比对中(在59个中)显着优于WAG,仅在2个比对中LG显着更差; 3)用LG,WAG和JTT推断的树形拓扑经常不同,这表明使用LG不仅影响似然值,而且影响输出树。 Pfam的测试结果与之相似。 LG和PHYML实施可从http://atgc.lirmm.fr/LG下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号