首页> 美国卫生研究院文献>Bioinformatics >Assessing phylogenetic motif models for predicting transcription factor binding sites
【2h】

Assessing phylogenetic motif models for predicting transcription factor binding sites

机译:评估系统发育基序模型以预测转录因子结合位点

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic tree and a model of evolution. However, these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning.>Results: We evaluate three PMM-based prediction algorithms, each of which uses a different treatment of gapped alignments, and we compare their prediction accuracy with that of a non-phylogenetic motif scanning approach. Surprisingly, all of these algorithms appear to be inferior to simple motif scanning, when accuracy is measured using a gold standard of validated yeast TFBSs. However, the PMM scanners perform much better than simple motif scanning when we abandon the gold standard and consider the number of statistically significant sites predicted, using column-shuffled ‘random’ motifs to measure significance. These results suggest that the common practice of measuring the accuracy of binding site predictors using collections of known sites may be dangerously misleading since such collections may be missing ‘weak’ sites, which are exactly the type of sites needed to discriminate among predictors. We then extend our previous theoretical model of the statistical power of PMM-based prediction algorithms to allow for loss of binding sites during evolution, and show that it gives a more accurate upper bound on scanner accuracy. Finally, utilizing our theoretical model, we introduce a new method for predicting the number of real binding sites in a genome. The results suggest that the number of true sites for a yeast TF is in general several times greater than the number of known sites listed in the Saccharomyces cerevisiae Database (SCPD). Among the three scanning algorithms that we test, the MONKEY algorithm has the highest accuracy for predicting yeast TFBSs.>Contact:
机译:>动机:已开发出多种算法,通过利用相关物种基因组的多个比对中隐含的进化信息来预测基因组内的转录因子结合位点(TFBS)。一种这样的方法使用标准位置特定基序模型的扩展,该扩展通过系统树和进化模型结合了系统信息。但是,这些系统发育基序模型(PMM)从未经过严格的基准测试,以确定它们是否比使用简单的位置权重矩阵扫描能更好地预测TFBS。>结果:我们评估了三种基于PMM的方法预测算法,每种算法对间隙比对使用不同的处理方法,我们将其预测精度与非系统进化基序扫描方法的预测精度进行比较。出乎意料的是,当使用经过验证的酵母TFBS的金标准测量准确性时,所有这些算法似乎都不如简单的图案扫描。但是,当我们放弃黄金标准并考虑使用列混洗的“随机”基元来衡量显着性时,考虑到预测的统计显着性位点的数量时,PMM扫描仪的性能要比简单的基元扫描好得多。这些结果表明,使用已知位点集合来测量结合位点预测因子准确性的常规做法可能会造成危险的误导,因为这样的集合可能缺少“弱”位点,而“弱”位点正是区分预测因子所需的位点类型。然后,我们扩展了基于PMM的预测算法的统计能力的先前理论模型,以允许在进化过程中丢失结合位点,并表明它为扫描仪精度提供了更准确的上限。最后,利用我们的理论模型,我们引入了一种预测基因组中真正结合位点数量的新方法。结果表明,酵母TF的真实位点数量通常比酿酒酵母数据库(SCPD)中列出的已知位点数量大几倍。在我们测试的三种扫描算法中,MONKEY算法对于预测酵母TFBS的准确性最高。>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号