首页> 美国卫生研究院文献>Nucleic Acids Research >Integrating genome sequence and structural data for statistical learning to predict transcription factor binding sites
【2h】

Integrating genome sequence and structural data for statistical learning to predict transcription factor binding sites

机译:整合基因组序列和结构数据进行统计学习以预测转录因子结合位点

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We report an approach to predict DNA specificity of the tetracycline repressor (TetR) family transcription regulators (TFRs). First, a genome sequence-based method was streamlined with quantitative P-values defined to filter out reliable predictions. Then, a framework was introduced to incorporate structural data and to train a statistical energy function to score the pairing between TFR and TFR binding site (TFBS) based on sequences. The predictions benchmarked against experiments, TFBSs for 29 out of 30 TFRs were correctly predicted by either the genome sequence-based or the statistical energy-based method. Using P-values or Z-scores as indicators, we estimate that 59.6% of TFRs are covered with relatively reliable predictions by at least one of the two methods, while only 28.7% are covered by the genome sequence-based method alone. Our approach predicts a large number of new TFBs which cannot be correctly retrieved from public databases such as FootprintDB. High-throughput experimental assays suggest that the statistical energy can model the TFBSs of a significant number of TFRs reliably. Thus the energy function may be applied to explore for new TFBSs in respective genomes. It is possible to extend our approach to other transcriptional factor families with sufficient structural information.
机译:我们报告了一种方法来预测四环抑制剂(TITRACE)系列转录调节剂(TFRs)的DNA特异性。首先,通过定义的定量p值流线型基于基于序列的方法,以滤除可靠的预测。然后,引入了一种框架以合并结构数据并训练基于序列的TFR和TFR结合位点(TFB)之间的配对进行分类。通过基于基因组序列或基于统计能量的方法,正确预测与实验基准测试的预测,TFBS为29个TFRS。使用P值或Z分数作为指标,我们估计了59.6%的TFR通过两种方法中的至少一种覆盖了相对可靠的预测,而仅基于基于基于基于基于基于基于基于基于基于基于基于序列的方法。我们的方法预测了大量新的TFB,无法从公共数据库(如脚印)正确检索。高通量实验测定表明统计能量可以可靠地模拟大量TFRS的TFBS。因此,可以应用能量功能来探索各个基因组中的新TFBS。可以将我们的方法扩展到具有足够结构信息的其他转录因子家族。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号