首页> 外文OA文献 >Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest
【2h】

Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

机译:随机森林发现CIS-incumatory元素细胞型特异性DNA主题语法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Abstract Background It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.
机译:摘要背景已经观察到,许多转录因子(TFS)可以根据其中表达TF的细胞类型与不同的基因组基因座结合,即使个体TF通常与不同细胞类型中的相同核心基序结合。 TF如何以这种高度细胞类型的特定方式与基因组结合,是一个关键的研究问题。一个假设是TF需要在不同细胞类型中的不同TFS共结合。如果是这种情况,则可以观察到TF基序的不同组合 - 一个基序语法 - 位于不同细胞类型中的TF结合位点。在这项研究中,我们开发了一种生物信息学方法,以基于发布的芯片-SEQ数据来系统地识别TF绑定站点的DNA主题,以及解决两个问题:(1)可以构建机器学习分类器以预测单元格类型仅基于主题组合的特异性,(2)可以从该分类器模型中提取有意义的细胞类型特定主题语法。结果我们介绍了一种基于森林(RF)的方法来构建多级分类器,以预测给出其基序含量的TF结合位点的细胞型特异性。我们将此RF分类器应用于多个单元格类型的TF(TCF7L2和MAX)的两个已发布的芯片-SEQ数据集。使用交叉验证,我们表明单独的主题组合确实预测了细胞类型。此外,我们提出了一种规则挖掘方法来提取RF分类器中最歧视的规则,从而允许我们发现底层细胞类型特定的主题语法。结论我们的生物信息学分析支持组合TF基序图案是特异性细胞类型的假设。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号