首页> 美国卫生研究院文献>Nucleic Acids Research >Discovery and validation of information theory-based transcription factor and cofactor binding site motifs
【2h】

Discovery and validation of information theory-based transcription factor and cofactor binding site motifs

机译:基于信息论的转录因子和辅因子结合位点基序的发现和验证

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes.
机译:来自ChIP-seq实验的数据可得出转录因子(TF)和其他调节蛋白的全基因组结合特异性。我们使用一种基于递归,阈值熵最小化的新颖主题发现管道分析了207个人类TF的765个ENCODE ChIP-seq峰数据集。该方法在消除补偿偏斜核苷酸组成的需要的同时,将真实的结合基序与噪音区分开,基于计算的亲和力量化单个结合位点的强度,并检测与主要免疫沉淀的TF靶标协调的相邻辅因子结合位点。我们获得了基于连续和两方信息论的93个序列特定TF的位置权重矩阵(iPWM),发现了127个TF的23个辅因子基序,并揭示了6个高可信度的新颖基序。这些iPWM的可靠性和准确性是通过四种独立的验证方法确定的,包括检测经过实验验证的结合位点,解释特征性SNP的作用,与先前发表的基序进行比较以及统计分析。我们还预测了以前未报告的TF核心调控相互作用(例如TF复合物)。这些iPWM构成了一个功能强大的工具,可用于预测已知结合位点中的序列变体的影响,对调节性SNP执行突变分析以及预测先前无法识别的结合位点和靶基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号