首页> 外文期刊>BMC Bioinformatics >Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures
【24h】

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures

机译:重采样策略可改善强相关结构下FDR控制中零假设的估计

获取原文
           

摘要

Background When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses. Results We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR. Conclusion With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.
机译:背景技术在进行多个假设检验时,控制误报的数量或误发现率(FDR)很重要。但是,在控制FDR和最大化功率之间需要权衡。已经提出了几种方法,例如q值方法,以估计测试假设中真实零假设的比例,并将此估计用于FDR的控制中。这些方法通常取决于测试统计数据是独立的(或仅弱相关)的假设。但是,许多类型的数据(例如微阵列数据)通常包含大规模的相关结构。我们的目标是通过改进对无效假设比例的估计,来开发一种方法来控制FDR,同时在高度相关的数据集中保持更高的功效。结果我们表明,当数据之间存在很强的相关性(这在微阵列数据集中很常见)时,无效假设比例的估计可能会高度可变,从而导致FDR的变化程度很高。因此,我们开发了一种重采样策略,通过打破基因表达值之间的相关性来减少变异,然后使用保守的策略选择重采样估计的上四分位数以获得对FDR的强大控制。结论通过对实际微阵列数据集的仿真研究和扰动,与诸如q值之类的竞争方法相比,我们的方法对无效假设的比例产生了稍微偏差的估计,但均方差较低。当选择与控制同一水平FDR的基因,我们的方法有平均显著降低错误发现率,以换取在功率较小的减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号