首页> 外文学位 >Computational identification of discriminative sequence motifs with dynamic search spaces.
【24h】

Computational identification of discriminative sequence motifs with dynamic search spaces.

机译:具有动态搜索空间的区分性序列基元的计算识别。

获取原文
获取原文并翻译 | 示例

摘要

Regulatory regions in mammalian genomes play important roles both in development and in the maintenance of cellular homestasis. Mutations in these regulatory regions are implicated in several disease phenotypes. Understanding the precise role of these regions requires detailed maps of where regulatory proteins bind to DNA. Experimentally determined genome-wide maps of protein binding are available at fairly coarse resolution, but cannot pinpoint the exact locations in the DNA where the proteins bind. Computational methods can identify the specific putative binding locations within the broader loci and build a model of the DNA sequences to which the protein binds. Yet state-of-the-art computational approches to identify specific DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called INSPECTOR, designed to find specific or predictive motifs, in contrast to over-represented sequence elements. Key distinguishing features of this algorithm are that it uses a dynamic search space to find discriminative motifs and that it models binding motifs using a full PWM (position weight matrix) rather than k-mers or regular expressions. We demonstrate that INSPECTOR finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, but that motifs found by INSPECTOR classify the ChIP-seq signals better than motifs from existing algorithms. We also show that I NSPECTOR outperforms a technology-specific algorithm in finding predictive motifs from protein-binding microarray (PBM) datasets. Finally we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters and find novel predictive motifs.
机译:哺乳动物基因组中的调节区在发育和维持细胞稳态中都起着重要作用。这些调节区中的突变与几种疾病表型有关。要了解这些区域的确切作用,就需要详细的调节蛋白与DNA结合图。实验确定的全基因组蛋白结合图可在相当粗略的分辨率下获得,但无法查明蛋白质结合在DNA中的确切位置。计算方法可以识别更广泛基因座中的特定推定结合位置,并建立蛋白质结合的DNA序列的模型。然而,用于识别特定DNA结合基序的最新计算方法通常会产生较弱的预测能力的基序。在这里,我们提出了一种新颖的计算算法,称为INSPECTOR,旨在发现特定或预测的基序,与过度代表的序列元素形成对比。该算法的主要区别特征在于,它使用动态搜索空间来查找有区别的图案,并使用完整的PWM(位置权重矩阵)而非k-mers或正则表达式对绑定图案进行建模。我们证明了INSPECTOR在几个哺乳动物ChIP-seq数据集中发现了与已知结合特异性相对应的基序,但是INSPECTOR发现的基序比现有算法的基序更好地对ChIP-seq信号进行了分类。我们还显示出I NSPECTOR在从蛋白质结合微阵列(PBM)数据集寻找预测性图案方面胜过特定于技术的算法。最后,我们应用该算法使用动态表达相似性度量而非固定表达簇从秀丽隐杆线虫的表达数据集中检测出图案,并发现了新颖的预测性图案。

著录项

  • 作者

    Karnik, Rahul.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Biology Bioinformatics.;Engineering Computer.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 101 p.
  • 总页数 101
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号