首页> 外文学位 >Parameter advising for multiple sequence alignment.
【24h】

Parameter advising for multiple sequence alignment.

机译:建议多个序列比对的参数。

获取原文
获取原文并翻译 | 示例

摘要

The problem of aligning multiple protein sequences is essential to many biological analyses, but most standard formulations of the problem are NP-complete. Due to both the difficulty of the problem and its practical importance, there are many heuristic multiple sequence aligners that a researcher has at their disposal. A basic issue that frequently arises is that each of these alignment tools has a multitude of parameters that must be set, and which greatly affect the quality of the alignment produced. Most users rely on the default parameter setting that comes with the aligner, which is optimal on average, but can produce a low-quality alignment for the given inputs.;This dissertation develops an approach called parameter advising to find a parameter setting that produces a high-quality alignment for each given input. A parameter advisor aligns the input sequences for each choice in a collection of parameter settings, and then selects the best alignment from the resulting alignments produced. A parameter advisor has two major components: (i) an advisor set of parameter choices that are given to the aligner, and (ii) an accuracy estimator that is used to rank alignments produced by the aligner.;Alignment accuracy is measured with respect to a known reference alignment, in practice a reference alignment is not available, and we can only estimate accuracy. We develop a new accuracy estimator that we call called Facet (short for "feature-based accuracy estimator") that computes an accuracy estimate as a linear combination of efficiently-computable feature functions, whose coefficients are learned by solving a large scale linear programming problem. We also develop an efficient approximation algorithm for finding an advisor set of a given cardinality for a fixed estimator, whose cardinality should ideally small, as the aligner is invoked for each parameter choice in the set.;Using Facet for parameter advising boosts advising accuracy by almost 20% beyond using a single default parameter choice for the hardest-to-align benchmarks.;This dissertation further applies parameter advising in two ways: (i) to ensemble alignment, which uses the advising process on a collection of aligners to choose both the aligner and its parameter settings, and (ii) to adaptive local realignment, which can align different regions of the input sequences with distinct parameter choices to conform to mutation rates as they vary across the lengths of the sequences.
机译:对齐多个蛋白质序列的问题对于许多生物学分析至关重要,但是该问题的大多数标准配方都是NP完全的。由于问题的难度及其实际重要性,研究人员可以使用许多启发式的多序列比对器。经常出现的一个基本问题是,每个对齐工具都必须设置多个参数,这些参数极大地影响了所生成对齐的质量。大多数用户都依赖于aligner附带的默认参数设置,该参数设置平均而言是最佳选择,但是对于给定的输入会产生低质量的对齐方式。每个给定输入的高质量对齐方式。参数顾问将参数设置集合中每个选择的输入序列进行比对,然后从产生的比对中选择最佳比对。参数顾问器有两个主要组成部分:(i)给定准器的一组参数选择顾问器;(ii)用于对定准器产生的对准进行排序的精度估计器;对准精度是相对于已知的参考对齐方式,实际上没有参考对齐方式,我们只能估算准确性。我们开发了一种新的精度估算器,称为Facet(“基于特征的精度估算器”的缩写),它将精度估算值计算为可有效计算的特征函数的线性组合,其系数是通过解决大规模线性规划问题来学习的。我们还开发了一种有效的近似算法,用于为固定估计量的给定基数找到一个顾问集,该估计量的基数理想情况下应该很小,因为对集合中的每个参数选择都调用了对齐器;使用Facet进行参数建议可通过以下方式提高建议精度:除了为难于对齐的基准测试使用单个默认参数选择之外,几乎还可以达到20%。本文还通过两种方式应用参数建议:(i)整体对齐,它使用对一系列对齐器的建议过程来选择两者对齐器及其参数设置,以及(ii)自适应局部重排,可以将输入序列的不同区域与不同的参数选择进行对齐,以适应突变率,因为它们在序列的长度上会有所不同。

著录项

  • 作者

    DeBlasio, Daniel Frank.;

  • 作者单位

    The University of Arizona.;

  • 授予单位 The University of Arizona.;
  • 学科 Computer science.;Bioinformatics.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 195 p.
  • 总页数 195
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号