声明: 网上摘抄
False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses (type I errors). It is a less conservative procedure for comparison, with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.
The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.
原来q-value是在计算FDR时候使用的,跟P value类似。下面的基本没看懂
Classification of m hypothesis tests
The following table defines some random variables related to the m hypothesis tests.
# declared non-significant | # declared significant | Total | |
---|---|---|---|
# true null hypotheses | U | V | m0 |
# non-true null hypotheses | T | S | m ? m0 |
Total | m ? R | R | m |
- m0 is the number of true null hypotheses
- m ? m0 is the number of false null hypotheses
- U is the number of true negatives
- V is the number of false positives
- T is the number of false negatives
- S is the number of true positives
- H1...Hm the null hypotheses being tested
- In m hypothesis tests of which m0 are true null hypotheses, R is an observable random variable, and S, T, U, and V are unobservable random variables.
The false discovery rate is given by and one wants to keep this value below a threshold α.
( is defined to be 0 when R = 0)
Controlling procedures
Independent tests
The Simes procedure ensures that its expected value is less than a given α (Benjamini and Hochberg 1995). This procedure is valid when the m tests are independent. Let be the null hypotheses and their corresponding p-values. Order these values in increasing order and denote them by . For a given α, find the largest k such that
Then reject (i.e. declare positive) all H(i) for .
...Note, the mean α for these m tests is which could be used as a rough FDR (RFDR) or "α adjusted for m indep. tests."
NOTE: The RFDR calculation shown here is not part of the Benjamini and Hochberg method.
Dependent tests
The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest k such that:
- If the tests are independent: c(m) = 1 (same as above)
- If the tests are positively correlated: c(m) = 1
- If the tests are negatively correlated:
In the case of negative correlation, c(m) can be approximated by using the Euler-Mascheroni constant
Using RFDR above, an approximate FDR (AFDR) is the min(mean α) for m dependent tests = RFDR / ( ln(m)+ 0.57721...).