首页> 外文期刊>BMC Bioinformatics >Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data
【24h】

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data

机译:基于ChIP-seq数据优化选择PWM图案数据库和序列扫描方法

获取原文
           

摘要

Background For many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter. Results We focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison of different methods of choosing thresholds for all motifs a priori. Surprisingly, we show that selecting a common false-positive rate gives results that are the least biased by the information content of the motif and therefore most uniformly accurate. Conclusion We provide a guide for researchers working with transcription factor motifs. It is supplemented with detailed results of the analysis and the benchmark datasets at http://bioputer.mimuw.edu.pl/papers/motifs/.
机译:背景技术现在,多年来,为了预测潜在的结合位点,已经通过所谓的基序描述了转录因子的结合偏好,所述基序通常在数学上由位置权重矩阵或类似模型定义。然而,尽管在公共和商业数据库中有成千上万个基序模型可用,但想要使用它们的研究人员却面临着许多竞争性方法,这些方法可用来鉴定目标基因组中的潜在结合位点,而且关于不同方法的最优性的公开信息很少。选择。由于可获得大量不同的模体模型,并且有许多实验数据集描述了数百个TF-ChIP-seq对中TF的实际结合,我们着手对此事进行全面分析。结果我们专注于鉴定人类基因组中潜在的转录因子结合位点的任务。首先,我们对不同数据库中可用模型的覆盖范围和质量进行了全面比较,表明与商业数据库相比,公共数据库具有可比的TF覆盖范围和更好的图案性能。其次,我们比较了不同的图案扫描仪,结果表明,无论使用何种数据库,科学界开发的工具都优于商业工具。第三,我们为每个主题计算一个检测阈值,以优化预测的准确性。最后,我们提供了对所有先验选择阈值的不同方法的深入比较。出乎意料的是,我们表明选择常见的假阳性率所得出的结果受基序信息内容的偏差最小,因此准确度最高。结论我们为研究转录因子基序的研究人员提供了指南。补充有详细的分析结果和基准数据集,网址为http://bioputer.mimuw.edu.pl/papers/motifs/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号