FDR

声明: 网上摘抄

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses (type I errors). It is a less conservative procedure for comparison, with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.

原来q-value是在计算FDR时候使用的，跟P value类似。下面的基本没看懂

Classification of m hypothesis tests

The following table defines some random variables related to the m hypothesis tests.

	# declared non-significant	# declared significant	Total
# true null hypotheses	U	V	m₀
# non-true null hypotheses	T	S	m ? m₀
Total	m ? R	R	m

m₀ is the number of true null hypotheses
m ? m₀ is the number of false null hypotheses
U is the number of true negatives
V is the number of false positives
T is the number of false negatives
S is the number of true positives
H₁...H_m the null hypotheses being tested
In m hypothesis tests of which m₀ are true null hypotheses, R is an observable random variable, and S, T, U, and V are unobservable random variables.

The false discovery rate is given by FDR and one wants to keep this value below a threshold α.

( FDR is defined to be 0 when R = 0)

Controlling procedures

Independent tests

The Simes procedure ensures that its expected value FDR is less than a given α (Benjamini and Hochberg 1995). This procedure is valid when the m tests are independent. Let FDR be the null hypotheses and FDR their corresponding p-values. Order these values in increasing order and denote them by FDR . For a given α, find the largest k such that FDR

Then reject (i.e. declare positive) all H_(i) for FDR .

...Note, the mean α for these m tests is FDR which could be used as a rough FDR (RFDR) or "α adjusted for m indep. tests."

NOTE: The RFDR calculation shown here is not part of the Benjamini and Hochberg method.

Dependent tests

The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest k such that:

If the tests are independent: c(m) = 1 (same as above)
If the tests are positively correlated: c(m) = 1
If the tests are negatively correlated:

In the case of negative correlation, c(m) can be approximated by using the Euler-Mascheroni constant

Using RFDR above, an approximate FDR (AFDR) is the min(mean α) for m dependent tests = RFDR / ( ln(m)+ 0.57721...).

Classification of m hypothesis tests

Controlling procedures

Independent tests

Dependent tests

相关文章