首页> 外文OA文献 >Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

【2h】

Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

机译：随机森林发现CIS-incumatory元素细胞型特异性DNA主题语法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Abstract Background It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

机译：摘要背景已经观察到，许多转录因子（TFS）可以根据其中表达TF的细胞类型与不同的基因组基因座结合，即使个体TF通常与不同细胞类型中的相同核心基序结合。 TF如何以这种高度细胞类型的特定方式与基因组结合，是一个关键的研究问题。一个假设是TF需要在不同细胞类型中的不同TFS共结合。如果是这种情况，则可以观察到TF基序的不同组合 - 一个基序语法 - 位于不同细胞类型中的TF结合位点。在这项研究中，我们开发了一种生物信息学方法，以基于发布的芯片-SEQ数据来系统地识别TF绑定站点的DNA主题，以及解决两个问题：（1）可以构建机器学习分类器以预测单元格类型仅基于主题组合的特异性，（2）可以从该分类器模型中提取有意义的细胞类型特定主题语法。结果我们介绍了一种基于森林（RF）的方法来构建多级分类器，以预测给出其基序含量的TF结合位点的细胞型特异性。我们将此RF分类器应用于多个单元格类型的TF（TCF7L2和MAX）的两个已发布的芯片-SEQ数据集。使用交叉验证，我们表明单独的主题组合确实预测了细胞类型。此外，我们提出了一种规则挖掘方法来提取RF分类器中最歧视的规则，从而允许我们发现底层细胞类型特定的主题语法。结论我们的生物信息学分析支持组合TF基序图案是特异性细胞类型的假设。

著录项

作者
Xin Wang; Peijie Lin; Joshua W. K. Ho;
展开▼
作者单位

展开▼
年度 2018
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest [J] . Xin Wang, Peijie Lin, Joshua W. K. Ho BMC Genomics . 2018,第1期

机译：随机森林发现CIS-incumatory元素细胞型特异性DNA主题语法
2. Genome-wide computational and expression analyses reveal G-quadruplex DNA motifs as conserved cis-regulatory elements in human and related species [J] . Verma A, Halder K, Halder R, Journal of Medicinal Chemistry . 2008,第18期

机译：全基因组的计算和表达分析表明，G-四链体DNA基序是人类和相关物种中保守的顺式调控元件
3. Genome-wide analysis of cis-regulatory element structure and discovery of motif-driven gene co-expression networks in grapevine [J] . Darren Chern Jan Wong, Rodrigo Lopez Gutierrez, Gregory Alan Gambetta, DNA research: an international journal for rapid publication of reports on genes and genomes . 2017,第3期

机译：全基因组的顺式调控元件结构分析和基元驱动的葡萄基因共表达网络的发现
4. RPPMD (Randomly projected possible motif discovery): An efficient bucketing method for finding DNA planted Motif [C] . Faisal Bin Ashraf, Ali Imam Abir, Md Sirajus Salekin, International Conference on Electrical, Computer and Communication Engineering . 2017

机译：RPPMD（随机预测可能的基序发现）：一种用于发现植有DNA的母题的有效存储方法
5. Integrating systems biology data repositories to improve cis-regulatory motif characterization and discovery: An elegant solution to an old difficult problem. [D] . Quest, Daniel J. 2009

机译：集成系统生物学数据存储库以改善顺式调控基序的表征和发现：一种解决老难题的绝妙方法。
6. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest [O] . Xin Wang, Peijie Lin, Joshua W. K. Ho 2018

机译：使用随机森林在顺式调控元件中发现细胞类型特异性DNA基序语法
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。

Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅