首页> 美国卫生研究院文献>other >Fully automated sequence alignment methods are comparable to and much faster than traditional methods in large data sets: an example with hepatitis B virus

【2h】

Fully automated sequence alignment methods are comparable to and much faster than traditional methods in large data sets: an example with hepatitis B virus

机译：全自动序列比对方法在大数据集中可与传统方法媲美并且比传统方法快得多：以乙型肝炎病毒为例

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected “by eye” prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.

机译：用于系统发育分析的序列比对（多重序列比对； MSA）是重要的一步，但随着最近DNA序列数据的激增，计算步骤也越来越昂贵。该序列数据大部分是公开可用的，但可能是非常零碎的（即完整的基因组和基因组片段的组合），这可能会使与MSA相关的计算问题更加复杂。传统上，使用自动算法生成比对，然后在系统发生推断之前“目测”检查和/或校正。但是，这种手动管理在现代系统发育学所需的数据规模上效率低下，并且导致比对无法重现。近来，已经开发出用于完全自动化大数据集比对的方法，但是与结合自动化和手动方法的更传统的比对方法相比，尚不清楚这些方法是否产生比对相容的系统发育的比对。在这里，我们使用来自全球分布和快速发展的乙型肝炎病毒（HBV）的大约33,000个公共可用序列来比较不同的比对方法。我们使用一个仅包含完整基因组的数据集，另一个还包含序列片段的数据集，比较了三种MSA方法：（1）使用传统软件的纯自动化方法，（2）包括人工目视编辑的自动化方法，和（3 ）的最新全自动方法。为了了解这些方法如何影响系统发育结果，我们比较了使用多种度量基于这些不同比对方法得出的树形拓扑。我们进一步确定了从每种比对类型估计的系统发育和不同统计支持阈值下是否支持现有HBV基因型的单亲。传统和全自动比对产生了相似的HBV系统发育。尽管分支支持阈值之间存在差异，但允许较低的支持阈值往往会导致树之间的差异更大。因此，可以通过与所用MSA方法无关的系统发育不确定性来最好地解释树木之间的差异。然而，自动对准方法不需要人工干预，因此与传统方法相比耗时少。因此，我们得出结论，即使在极难对齐的数据集中，用于MSA的全自动算法也与旧方法完全兼容。此外，我们发现，大多数HBV诊断基因型均不对应于进化健全的人群，而与对准类型和支持阈值无关。这表明数据库中的基因型分类可能存在错误，或者HBV基因型可能需要修改。

著录项

期刊名称 other
作者
Therese A. Catanach; Andrew D. Sweet; Nam-phuong D. Nguyen; Rhiannon M. Peery; Andrew H. Debevec; Andrea K. Thomer; Amanda C. Owings; Bret M. Boyd; Aron D. Katz; Felipe N. Soto-Adames; Julie M. Allen;
展开▼
作者单位

展开▼
年(卷),期 -1(7),-1
年度 -1
页码 e6142
总页数 25
原文格式 PDF
正文语种
中图分类
关键词
Genome Automated alignment Manual alignment Virus s-region HBV;

机译：基因组;自动比对;手动比对;病毒;s区;HBV;
入库时间 2022-08-21 11:06:10

相似文献

外文文献
中文文献
专利

1. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets [J] . Nute Michael, Saleh Ehsan, Warnow Tandy Systematic Biology . 2019,第3期

机译：与蛋白质数据集上的其他对准方法相比，评估统计多序列对齐
2. Ebolavirus diagnosis made simple, comparable and faster than molecular detection methods: preparing for the future [J] . Ameh S. James, Shawn Todd, Nina M. Pollak, Virology Journal . 2018,第1期

机译：埃博拉病毒的诊断比分子检测方法更简单，可比，并且更快：为未来做准备
3. Analyses of the radiation of birnaviruses from diverse host phyla and of their evolutionary affinities with other double-stranded RNA and positive strand RNA viruses using robust structure-based multiple sequence alignments and advanced phylogenetic methods [J] . Jean-Fran?ois Gibrat, Mahendra Mariadassou, Pierre Boudinot, BMC Evolutionary Biology . 2013,第1期

机译：使用稳健的基于结构的多序列比对和先进的系统发育方法，分析来自不同宿主门的鼻病毒的辐射及其与其他双链RNA和正链RNA病毒的进化亲和力
4. Applying Sequence Alignment Methods to Large Activity-Travel Data Sets: Exploration of Heuristic Approach [C] . Chang-Hyeon Joh, Harry Timmermans Transportation Research Board Annual meeting . 2011

机译：将序列比对方法应用于大型活动-旅行数据集：启发式方法的探索
5. Mapping disasters: The application of a disaster-sociological 'theoretical superstructure' and methodology in a prima facie case for investigating the role of hepatitis B vaccines in the contamination of the Canadian blood supply with human immunodeficiency (AIDS) virus (HIV) and hepatitis C virus (HCV). [D] . Krassnitzky, Olaf. 2000

机译：绘制灾害图：灾害社会学的“理论上层建筑”和方法论在初步案例中的应用，旨在调查乙型肝炎疫苗在加拿大人体免疫缺陷病毒（AIDS）病毒（HIV）和丙型肝炎病毒污染中的作用病毒（HCV）。
6. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets [O] . Michael Nute, Ehsan Saleh, Tandy Warnow -1

机译：与蛋白质数据集上的其他比对方法相比评估统计上的多序列比对
7. Peer Review #2 of "Fully automated sequence alignment methods are comparable to, and much faster than, traditional methods in large data sets: an example with hepatitis B virus (v0.2)" [O] . 2019

机译：“完全自动化序列对准方法”的同行评审＃2与大数据集中的传统方法相当，更快，更快地更快，乙型肝炎病毒（V0.2）“
8. Compounds, Methods and Compositions Useful for the Treatment of Bovine Viral Diarrhea Virus (BVDV) Infection and Hepatitis C Virus (HCV) Infection. [R] . Dykstra, C. C., Givens, M. D., Stringfellow, D. A., 2005

机译：可用于治疗牛病毒性腹泻病毒（BVDV）感染和丙型肝炎病毒（HCV）感染的化合物，方法和组合物。

Fully automated sequence alignment methods are comparable to and much faster than traditional methods in large data sets: an example with hepatitis B virus

摘要

著录项

相似文献

相关主题

期刊订阅