首页> 外文会议>Intelligent Systems for Molecular Biology >Assessing phylogenetic motif models for predicting transcription factor binding sites
【24h】

Assessing phylogenetic motif models for predicting transcription factor binding sites

机译:评估用于预测转录因子结合位点的系统发育基序模型

获取原文

摘要

Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information 'via a phylogenetic tree and a model of evolution. However; these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether'-they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning. Results: We evaluate three PMM-based prediction algorithms, each of which uses a different treatment of gapped alignments, and we compare their prediction accuracy with that of a non-phylogenetic motif scanning approach. Surprisingly, all of these algorithms appear to be inferior to simple motif scanning, when accuracy is measured using a gold standard of validated yeast TFBSs. However, the PMM scanners perform much better than simple motif scanning when we abandon the gold standard and consider the number of statistically significant sites predicted, using column-shuffled 'random' motifs to measure significance. These results suggest that the common practice of measuring the accuracy of binding site predictors using collections of known sites may be dangerously misleading since such collections may be missing 'weak' sites, which are exactly the type of sites needed to discriminate among predictors. We then extend our previous theoretical model of the statistical power of PMM-based prediction algorithms to allow for loss of binding sites during evolution, and show that it gives a more accurate upper bound on scanner accuracy. Finally, utilizing our theoretical model, we introduce a new method for predicting the number of real binding sites in a genome. The results suggest that the number of true sites for a yeast TF is in general several times greater than the number of known sites listed in the Saccharomyces cerevisiae Database (SCPD). Among the three scanning algorithms that we test, the MONKEY algorithm has the highest accuracy for predicting yeast TFBSs.
机译:动机:已经开发了各种算法以通过利用相关物种基因组的多次对准中隐含的进化信息来预测基因组内的转录因子结合位点(TFBS)。一种这种方法使用标准位置特异性基序模型的延伸,该模型通过系统发育树和演化模型结合了系统发育信息。然而;这些系统发育主题模型(PMMS)从未经过严格的基准测试,以便确定是否能够更好地预测TFBS,而不是使用简单位置重量矩阵扫描。结果:我们评估了三种基于PMM的预测算法,每个预测算法使用不同的螺纹对准处理,并且我们将它们的预测精度与非系统发育基序扫描方法的预测精度进行比较。令人惊讶的是,当使用验证酵母TFBS的金标准测量精度时,所有这些算法似乎差不多。然而,当我们放弃黄金标准时,PMM扫描仪表现出比简单的主题扫描更好,并考虑使用列随机的“随机”图案来测量意义的统计上有显着的网站的数量。这些结果表明,使用已知站点的集合测量结合位点预测器的准确性的常见实践可能是危险的误导,因为这种系列可能缺少“弱”场地,这正是识别预测因子所需的网站类型。然后,我们通过基于PMM的预测算法的统计力的统计力的理论模型扩展,以便在进化期间丢失绑定站点,并表明它在扫描仪精度上提供更准确的上限。最后,利用我们的理论模型,我们介绍了一种预测基因组中真实结合位点的数量的新方法。结果表明,酵母TF的真实网站数量通常大于Saccharomyces Cerevisiae数据库(SCPD)中列出的已知网站数量的数倍。在我们测试的三种扫描算法中,猴子算法具有预测酵母TFBS的最高精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号