首页> 外文期刊>Nucleic Acids Research >Theoretical and empirical quality assessment of transcription factor-binding motifs
【24h】

Theoretical and empirical quality assessment of transcription factor-binding motifs

机译:转录因子结合基序的理论和经验质量评估

获取原文
获取原文并翻译 | 示例
           

摘要

Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program 'matrix-quality', that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied 'matrix-quality' to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP-seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets.
机译:位置特定评分矩阵(PSSM)通常用于预测基因组序列中的转录因子(TF)结合位点。但是,由于构建矩阵或使用矩阵扫描序列时,由于使用了少量训练位点或对参数进行了不适当的选择,因此它们预测新结合位点的可靠性可能远非最佳。诸如E值和信息含量之类的矩阵质量度量取决于理论模型,并且在完整基因组序列的情况下可能会失败。我们提出一种在“矩阵质量”程序中实施的方法,该方法结合了理论和经验分数分布来评估PSSM预测TF结合位点的可靠性。我们应用“矩阵质量”来估计矩阵对细菌,酵母和小鼠TF的预测能力。对RegulonDB进行的矩阵评估显示出一些预测性很差的主题,并允许我们量化通过应用多基因组主题发现获得的改进。有趣的是,该方法揭示了全球监管机构与特定监管机构之间的差异。它还强调了从高通量ChIP芯片(细菌和酵母TF),ChIP-seq和实验(小鼠TF)获得的序列集中结合位点的富集。这里介绍的方法有许多应用,包括:在扫描序列之前选择可靠的基序;改善TFs数据库中的主题集合;评估使用高通量数据集发现的图案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号