首页> 美国卫生研究院文献>other >Fully automated sequence alignment methods are comparable to and much faster than traditional methods in large data sets: an example with hepatitis B virus
【2h】

Fully automated sequence alignment methods are comparable to and much faster than traditional methods in large data sets: an example with hepatitis B virus

机译:全自动序列比对方法在大数据集中可与传统方法媲美并且比传统方法快得多:以乙型肝炎病毒为例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected “by eye” prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.
机译:用于系统发育分析的序列比对(多重序列比对; MSA)是重要的一步,但随着最近DNA序列数据的激增,计算步骤也越来越昂贵。该序列数据大部分是公开可用的,但可能是非常零碎的(即完整的基因组和基因组片段的组合),这可能会使与MSA相关的计算问题更加复杂。传统上,使用自动算法生成比对,然后在系统发生推断之前“目测”检查和/或校正。但是,这种手动管理在现代系统发育学所需的数据规模上效率低下,并且导致比对无法重现。近来,已经开发出用于完全自动化大数据集比对的方法,但是与结合自动化和手动方法的更传统的比对方法相比,尚不清楚这些方法是否产生比对相容的系统发育的比对。在这里,我们使用来自全球分布和快速发展的乙型肝炎病毒(HBV)的大约33,000个公共可用序列来比较不同的比对方法。我们使用一个仅包含完整基因组的数据集,另一个还包含序列片段的数据集,比较了三种MSA方法:(1)使用传统软件的纯自动化方法,(2)包括人工目视编辑的自动化方法,和(3 )的最新全自动方法。为了了解这些方法如何影响系统发育结果,我们比较了使用多种度量基于这些不同比对方法得出的树形拓扑。我们进一步确定了从每种比对类型估计的系统发育和不同统计支持阈值下是否支持现有HBV基因型的单亲。传统和全自动比对产生了相似的HBV系统发育。尽管分支支持阈值之间存在差异,但允许较低的支持阈值往往会导致树之间的差异更大。因此,可以通过与所用MSA方法无关的系统发育不确定性来最好地解释树木之间的差异。然而,自动对准方法不需要人工干预,因此与传统方法相比耗时少。因此,我们得出结论,即使在极难对齐的数据集中,用于MSA的全自动算法也与旧方法完全兼容。此外,我们发现,大多数HBV诊断基因型均不对应于进化健全的人群,而与对准类型和支持阈值无关。这表明数据库中的基因型分类可能存在错误,或者HBV基因型可能需要修改。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号