...
首页> 外文期刊>Systematic Biology >ModelOMatic: Fast and Automated Model Selection between RY, Nucleotide, Amino Acid, and Codon Substitution Models
【24h】

ModelOMatic: Fast and Automated Model Selection between RY, Nucleotide, Amino Acid, and Codon Substitution Models

机译:ModelOMatic:RY,核苷酸,氨基酸和密码子替代模型之间的快速自动模型选择

获取原文
获取原文并翻译 | 示例

摘要

Molecular phylogenetics is a powerful tool for inferring both the process and pattern of evolution from genomic sequence data. Statistical approaches, such as maximum likelihood and Bayesian inference, are now established as the preferred methods of inference. The choice of models that a researcher uses for inference is of critical importance, and there are established methods for model selection conditioned on a particular type of data, such as nucleotides, amino acids, or codons. A major limitation of existing model selection approaches is that they can only compare models acting upon a single type of data. Here, we extend model selection to allow comparisons between models describing different types of data by introducing the idea of adapter functions, which project aggregated models onto the originally observed sequence data. These projections are implemented in the program ModelOMatic and used to perform model selection on 3722 families from the PANDIT database, 68 genes from an arthropod phylogenomic data set, and 248 genes from a vertebrate phylogenomic data set. For the PANDIT and arthropod data, we find that amino acid models are selected for the overwhelming majority of alignments; with progressively smaller numbers of alignments selecting codon and nucleotide models, and no families selecting RY-based models. In contrast, nearly all alignments from the vertebrate data set select codon-based models. The sequence divergence, the number of sequences, and the degree of selection acting upon the protein sequences may contribute to explaining this variation in model selection. Our ModelOMatic program is fast, with most families from PANDIT taking fewer than 150 s to complete, and should therefore be easily incorporated into existing phylogenetic pipelines. ModelOMatic is available at https://code.google.com/p/modelomatic/.
机译:分子系统学是从基因组序列数据推断进化过程和模式的有力工具。统计方法,例如最大似然法和贝叶斯推理,现已确立为首选的推理方法。研究人员用于推断的模型的选择至关重要,并且已经建立了以特定类型的数据为条件的模型选择方法,例如核苷酸,氨基酸或密码子。现有模型选择方法的主要限制是它们只能比较作用于单一类型数据的模型。在这里,我们通过引入适配器功能的思想扩展模型选择,以允许在描述不同类型数据的模型之间进行比较,该功能将聚合模型投影到最初观察到的序列数据上。这些预测是在ModelOMatic程序中实现的,用于从PANDIT数据库中对3722族,节肢动物系统发育数据集的68个基因和脊椎动物系统发育数据集的248个基因进行模型选择。对于PANDIT和节肢动物数据,我们发现为绝大多数比对选择了氨基酸模型。越来越少的比对选择密码子和核苷酸模型,没有家庭选择基于RY的模型。相反,来自脊椎动物数据集的几乎所有比对都选择基于密码子的模型。序列差异,序列数量和作用于蛋白质序列的选择程度可能有助于解释模型选择中的这种变化。我们的ModelOMatic程序速度很快,大多数来自PANDIT的家庭只需不到150 s的时间即可完成,因此应轻松地纳入现有的系统发育管道。可从https://code.google.com/p/modelomatic/获得ModelOMatic。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号