首页> 外文学位 >Sequence analysis methods for the detection of promoters and transcription factor binding sites.
【24h】

Sequence analysis methods for the detection of promoters and transcription factor binding sites.

机译:用于检测启动子和转录因子结合位点的序列分析方法。

获取原文
获取原文并翻译 | 示例

摘要

The detection of promoters, and their associated transcription factor binding sites (DNA motifs) is an increasingly important biological problem. Knowledge of the location of every regulatory sequence in an organism would bring us one step closer to a computational model of the cell.As our understanding of these binding sites becomes more sophisticated, the computational models used in their analysis must also advance. Accordingly, there is also a need to incorporate increasing amounts of disparate data into these models. In this thesis we investigate the use of graphs to represent and detect transcription factor binding sites we further augment these motif-finding algorithms by applying machine-learning techniques to incorporate heterogeneous data. In ancillary experiments, we apply aspects of this work to detecting promoters in the E. coli genome.We developed the MotifCut algorithm, a novel ab initio motif-finding algorithm. This method uses a graph-based representation of DNA sequence, and methods from fractional programming to deterministically find the set of segments in a DNA sequence that are the most similar. This group of highly similar DNA segments are the most likely constituents of a motif in the sequence.The MotifScan algorithm uses the same graphical representation as MotifCut to detect new examples of known binding sites. This algorithm detects new binding sites by comparison to clusters of k-mers in the original motif graph. The MotifScan algorithm was further extended by the addition of classification algorithms (specifically, Support Vector Machines (SVMs)) applied to external data. This allows us to add some context to a putative binding site. For example, binding sites often function together as modules of regulation. Therefore, the location of other binding sites nearby can help us determine whether a binding site is real or not.The methods developed to augment MotifScan were applied to the problem of promoter detection in E. coli. In this project we investigated the use of SVMs to combine data from a number of heterogeneous data sources, including inferred DNA structure, to improve our ability to detect bacterial promoters. In the process we can learn something about the signals that RNA Polymerase itself responds to.
机译:启动子及其相关转录因子结合位点(DNA基序)的检测是一个日益重要的生物学问题。对生物中每个调节序列的位置的了解将使我们更接近细胞的计算模型。随着我们对这些结合位点的理解变得越来越复杂,用于它们的分析的计算模型也必须前进。因此,还需要将越来越多的完全不同的数据合并到这些模型中。在本文中,我们研究了使用图表示和检测转录因子结合位点的方法,我们通过应用机器学习技术并结合异构数据来进一步增强这些基序发现算法。在辅助实验中,我们将这项工作的各个方面应用于检测大肠杆菌基因组中的启动子。我们开发了MotifCut算法,这是一种新型的从头开始的主题查找算法。此方法使用基于图的DNA序列表示法,以及从分数编程到确定性地找到DNA序列中最相似的区段集的方法。这组高度相似的DNA片段是序列中基序最可能的组成部分。MotifScan算法使用与MotifCut相同的图形表示法来检测已知结合位点的新实例。该算法通过与原始主题图中的k-mers簇进行比较来检测新的结合位点。通过添加应用于外部数据的分类算法(特别是支持向量机(SVM))进一步扩展了MotifScan算法。这使我们能够向假定的绑定位点添加一些上下文。例如,结合位点通常一起充当调节模块。因此,附近其他结合位点的位置可以帮助我们确定结合位点是否是真实的。为增强MotifScan而开发的方法被应用于大肠杆菌中启动子检测的问题。在这个项目中,我们调查了SVM的使用,以结合来自多个异类数据源的数据,包括推断的DNA结构,以提高我们检测细菌启动子的能力。在此过程中,我们可以了解有关RNA聚合酶本身响应的信号的信息。

著录项

  • 作者

    Naughton, Brian Thomas.;

  • 作者单位

    Stanford University.;

  • 授予单位 Stanford University.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 126 p.
  • 总页数 126
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号