Multiple hypothesis testing is concerned with appropriately controlling the rate of false positives when testing several hypotheses simultaneously, while maintaining the power of each test as much as possible. One multiple hypothesis testing error measure is the False Discovery Rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding many significant results among many tests. In this work, we introduce a modified version of the FDR called the “positive False Discovery Rate” (pFDR). We argue the pFDR is a more appropriate and useful error measure, and we investigate its statistical properties. When assuming the test statistics come from a mixture distribution, we show the pFDR can be written as a posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the “q-value” is introduced and investigated, which is a natural “Bayesian p-value,” or rather the pFDR analogue of the p-value. This idea is also generalized to any multiple hypothesis testing error measure. Using these results, we introduce point estimates of the FDR and pFDR for fixed rejection regions. The point estimates provide proper conservative behavior in the three scenarios of (1) estimating false discovery rates for fixed rejection regions, (2) estimating rejection regions for fixed false discovery rates, and (3) simultaneously estimating false discovery rates over all possible rejection regions—even under certain forms of dependence. It is shown that this new set of methodology extends the current methodology and also provides increases in power. We apply the methodology to the problem of detecting differential gene expression between two or more biological samples based on DNA microarray data. This application is well suited because the dependence between the tests (genes) is weak and the number of tests is quite large.
展开▼