...
首页> 外文期刊>BMC Evolutionary Biology >SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
【24h】

SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics

机译:SCaFoS:用于系统发育组学的序列选择,连接和融合的工具

获取原文
           

摘要

BackgroundPhylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context.ResultsHere, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise.ConclusionSCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
机译:背景技术基于富含基因和物种(系统发育组学)的数据集的系统发育分析正成为解决进化问题的标准方法。但是,与大型数据集的组装相关的一些困难,例如每个物种(旁系或异源基因)基因的多个副本,给定物种缺少某些基因或部分序列。在系统发生推断中使用未检测到的同源或异源基因可能会导致结果不准确,并且使用部分序列会导致分辨率不足。结果在系统发育组学中需要选择序列,物种和基因的工具。结果在此,我们介绍了SCaFoS,该工具可快速组装包含最大系统发育信息的植物遗传学数据集,同时可调整系统中缺失数据的数量。选择物种,序列和基因。从单个序列比对开始,并使用用户定义的单系统组,SCaFoS可以创建具有部分序列的嵌合体,或在多个序列中选择直系同源和/或进化最慢的序列。一旦选择了代表每个预定义单基因组的序列,SCaFos就会根据用户允许的缺失数据水平保留基因,并以与标准系统发生推理软件兼容的几种格式生成用于超级矩阵和超级树分析的文件。由于不存在明确的序列选择标准,因此可以使用半自动模式来适应用户的专业知识。结论SCaFos能够处理数百种物种的基因和基因数据集,无论是氨基酸水平还是核苷酸水平。它具有图形界面,可以集成在自动工作流程中。此外,SCaFoS是第一个整合用户知识以选择直系同源序列,创建嵌合序列以减少缺失数据并根据其缺失数据水平选择基因的工具。最后,将SCaFoS应用于不同的数据集,我们表明,对基因,物种和序列的明智选择减少了树木的重建伪像,尤其是在数据集包括快速进化的物种的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号