首页> 外文学位 >Combinatorial optimization and application to DNA sequence analysis.
【24h】

Combinatorial optimization and application to DNA sequence analysis.

机译:组合优化及其在DNA序列分析中的应用。

获取原文
获取原文并翻译 | 示例

摘要

With recent and continuing advances in bioinformatics, the volume of sequence data has increased tremendously. Along with this increase, there is a growing need to develop efficient algorithms to process such data in order to make useful and important discoveries. Careful analysis of genomic data will benefit science and society in numerous ways, including the understanding of protein sequence functions, early detection of diseases, and finding evolutionary relationships that exist among various organisms.;Most sequence analysis problems arising from computational genomics and evolutionary biology fall into the class of NP-complete problems. Advances in exact and approximate algorithms to address these problems are critical. In this thesis, we investigate a novel graph theoretical model that deals with fundamental evolutionary problems. The model allows incorporation of the evolutionary operations "insertion", "deletion", and "substitution", and various parameters such as relative distances and weights. By varying appropriate parameters and weights within the model, several important combinatorial problems can be represented, including the weighted supersequence, weighted superstring, and weighted longest common sequence problems. Consequently, our model provides a general computational framework for solving a wide variety of important and difficult biological sequencing problems, including the multiple sequence alignment problem, and the problem of finding an evolutionary ancestor of multiple sequences.;In this thesis, we develop large scale combinatorial optimization techniques to solve our graph theoretical model. In particular, we formulate the problem as two distinct but related models: constrained network flow problem and weighted node packing problem. The integer programming models are solved in a branch and bound setting using simultaneous column and row generation. The methodology developed will also be useful to solve large scale integer programming problems arising in other areas such as transportation and logistics.
机译:随着生物信息学的最新发展和持续发展,序列数据的数量已大大增加。随着这种增加,越来越需要开发有效的算法来处理此类数据,以便做出有用且重要的发现。仔细分析基因组数据将以多种方式使科学和社会受益,包括对蛋白质序列功能的理解,疾病的早期发现以及发现各种生物之间存在的进化关系。;大多数由计算基因组学和进化生物学引起的序列分析问题进入NP完全问题类解决这些问题的精确算法和近似算法的进步至关重要。在本文中,我们研究了一种新颖的图论模型,该模型处理了基本的进化问题。该模型允许合并进化操作“插入”,“删除”和“替代”,以及各种参数,例如相对距离和权重。通过改变模型中的适当参数和权重,可以表示几个重要的组合问题,包括加权超序列,加权超串和加权最长公共序列问题。因此,我们的模型提供了一个通用的计算框架,用于解决各种重要且困难的生物测序问题,包括多序列比对问题以及寻找多个序列的进化祖先的问题。组合优化技术来解决我们的图形理论模型。特别地,我们将该问题表述为两个不同但相关的模型:受约束的网络流量问题和加权节点打包问题。整数编程模型使用同时生成的列和行在分支和边界设置中求解。所开发的方法学也将用于解决在其他领域(例如运输和物流)中出现的大规模整数规划问题。

著录项

  • 作者

    Gupta, Kapil.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering System Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 111 p.
  • 总页数 111
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号