首页> 外文期刊>Briefings in bioinformatics >Features that define the best ChIP-seq peak calling algorithms
【24h】

Features that define the best ChIP-seq peak calling algorithms

机译:定义最佳芯片-SEQ峰呼叫算法的功能

获取原文
获取原文并翻译 | 示例
           

摘要

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods GEM, MACS2, MUSIC, BCP, Threshold-based method (TM) and ZINBA] that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than the others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than the ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.
机译:染色质免疫沉淀,然后进行测序(芯片-SEQ)是研究基因调节蛋白的重要工具,例如转录因子和组蛋白。峰值呼叫是分析这些数据的第一步之一。峰值呼叫由两个子问题组成:识别候选峰和测试候选峰的统计显着性。我们调查了30种方法,并确定了两个子问题的12个特征,使彼此区分方法。我们选择了六种方法GEM,MACS2,音乐,BCP,阈值的方法(TM)和ZinBA],该特征空间跨越了300个模拟芯片-SEQ数据集的组合,3个真实数据集和数学分析来识别功能允许一些比其他方法更好的方法。我们证明了明确结合芯片和输入样本的方法的方法比没有的方法更强大。使用不同尺寸的窗口的方法比没有的Windows更强大。对于候选峰的统计测试,使用泊松测试的方法对其候选峰的校准比使用二项式测试的方法更强大。 BCP和MACS2在模拟转录因子绑定数据上具有最佳的操作特性。宝石具有含有免疫沉淀因子的结合基质的前500个峰的最高分数,其10%的峰值在10碱基对的基序中。 BCP和音乐在组型数据上表现最佳。这些调查结果提供了用于给定应用程序选择最佳峰值呼叫者的指导和理由。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号