...
首页> 外文期刊>BMC Bioinformatics >OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif
【24h】

OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif

机译:OHMM:一种隐马尔可夫模型,可准确预测具有自重叠结合基序的转录因子的占有率

获取原文
           

摘要

Background DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results. Results We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: O ccupancy via H idden M arkov M odel. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-κB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-κB binding sites predicted by our method are likely to be functional. Conclusion Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-κB function and regulation and possible new biological roles of NF-κB were uncovered.
机译:几个重要转录因子的背景DNA序列结合基序恰好是自我重叠的。当前的许多管制场所识别方法都没有明确考虑重叠场所。而且,大多数方法使用任意阈值,并且无法提供统计量的生物物理解释。另外,通常的方法在识别位点时不包括位点相对于整合概率框架中的转录起始位点(TSS)的位置。忽略这些功能可能导致不正确的预测以及错误的设计和实验结果解释。结果我们开发了一种基于隐马尔可夫模型(HMM)的工具,该工具可通过结合其替代结合模式的作用来识别转录因子的结合位置,并优先考虑自身重叠的DNA基序。将HMM参数解释为生物物理量,该方法将转录因子在DNA序列上的占有概率作为判别函数,从而使该算法的名称为OHMM:O Hidden M arkov M odel。 OHMM通过使用包含已知位点的未比对序列来训练发射概率,并估计反映该基因组中所有启动子中位点密度的转移概率,来学习分类阈值。在识别位点时,它会调整参数以模拟随着距转录起始位点距离的变化而改变的位点密度。而且,它为设计凝胶移位实验中的填充序列提供了指导。在与转录因子NF-κB的结合位点的背景下,我们发现OHMM预测的占据概率与凝胶迁移实验中的结合亲和力密切相关。高进化保守性评分和实验验证的调控基因富集表明,我们方法预测的NF-κB结合位点可能具有功能。结论我们的方法通过计算转录因子的局部占有率,专门用于鉴定具有多个重叠结合位点的位置。此外,将OHMM视为生物物理模型可以使我们以有原则的方式学习分类阈值。 OHMM的另一个功能是,我们允许过渡概率随相对于TSS的位置而变化。 OHMM可用于预测身体占有率,并为凝胶迁移实验的正确设计提供指导。根据我们的预测,发现了有关NF-κB功能和调节的新见解以及NF-κB可能的新生物学作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号