首页> 外文期刊>PLoS Computational Biology >Discovering Motifs in Ranked Lists of DNA Sequences
【24h】

Discovering Motifs in Ranked Lists of DNA Sequences

机译:在DNA序列的排名列表中发现基序

获取原文
           

摘要

Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.
机译:用于发现序列集与背景集相比富集的序列元素的计算方法是分子生物学研究的基础。一个例子是发现从ChIP芯片(微阵列上的染色质免疫沉淀)测量中推断出的转录因子结合基序。序列基序发现中的几个主要挑战仍然需要考虑:(i)需要一种原则上的方法来将数据分为目标和背景集; (ii)缺乏严格的模型和精确的p值来测量图案富集; (iii)需要一个适当的框架来说明主题多重性; (iv)在许多现有方法中,即使应用于随机生成的数据,也倾向于报告可能是重要的图案。在本文中,我们提出了一个统计框架,用于发现排名列表中的丰富序列元素,从而解决了这四个问题。我们在称为DRIM(排名不平衡基序的发现)的软件应用程序中演示了此框架的实现,该应用程序可识别排名DNA序列列表中的序列基序。我们将DRIM应用于ChIP-chip和CpG甲基化数据,并获得以下结果。 (i)鉴定酵母ChIP芯片数据中的50个新型推定转录因子(TF)结合位点。进一步研究了其中一些的生物学功能,以获取有关酵母转录调控网络的新见解。例如,我们的发现可以阐明TF ARO80的网络。另一个发现涉及对包含CA重复序列的系统的TF结合增强。 (ii)在人类癌症CpG甲基化数据中发现新的基序。值得注意的是,这些基序中的大多数与促进组蛋白甲基化的Polycomb复合物结合的DNA序列元件相似。因此,我们的发现支持了一种模型,其中组蛋白甲基化和CpG甲基化被机械地联系在一起。总体而言,我们证明了DRIM软件工具中包含的统计框架对于识别表达和ChIP芯片至CpG甲基化数据等各种应用中的调控序列元素非常有效。 DRIM可从http://bioinfo.cs.technion.ac.il/drim公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号