00. .. [2] Benjamini, Yoav, and Daniel Yekutieli. "The control of the false discovery rate in multiple testing under dependency." Annals of statistics (2001): 1165-1188. .. [3] TileStats. FDR - Benjamini-Hochberg explained - Youtube. https://www.youtube.com/watch?v=rZKa4tW2NKs. .. [4] Neuhaus, Karl-Ludwig, et al. "Improved thrombolysis in acute myocardial infarction with front-loaded administration of alteplase: results of the rt-PA-APSAC patency study (TAPS)." Journal of the American College of Cardiology 19.5 (1992): 885-891. Examples -------- We follow the example from [1]_. Thrombolysis with recombinant tissue-type plasminogen activator (rt-PA) and anisoylated plasminogen streptokinase activator (APSAC) in myocardial infarction has been proved to reduce mortality. [4]_ investigated the effects of a new front-loaded administration of rt-PA versus those obtained with a standard regimen of APSAC, in a randomized multicentre trial in 421 patients with acute myocardial infarction. There were four families of hypotheses tested in the study, the last of which was "cardiac and other events after the start of thrombolitic treatment". FDR control may be desired in this family of hypotheses because it would not be appropriate to conclude that the front-loaded treatment is better if it is merely equivalent to the previous treatment. The p-values corresponding with the 15 hypotheses in this family were >>> ps = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, ... 0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000] If the chosen significance level is 0.05, we may be tempted to reject the null hypotheses for the tests corresponding with the first nine p-values, as the first nine p-values fall below the chosen significance level. However, this would ignore the problem of "multiplicity": if we fail to correct for the fact that multiple comparisons are being performed, we are more likely to incorrectly reject true null hypotheses. One approach to the multiplicity problem is to control the family-wise error rate (FWER), that is, the rate at which the null hypothesis is rejected when it is actually true. A common procedure of this kind is the Bonferroni correction [1]_. We begin by multiplying the p-values by the number of hypotheses tested. >>> import numpy as np >>> np.array(ps) * len(ps) array([1.5000e-03, 6.0000e-03, 2.8500e-02, 1.4250e-01, 3.0150e-01, 4.1700e-01, 4.4700e-01, 5.1600e-01, 6.8850e-01, 4.8600e+00, 6.3930e+00, 8.5785e+00, 9.7920e+00, 1.1385e+01, 1.5000e+01]) To control the FWER at 5%, we reject only the hypotheses corresponding with adjusted p-values less than 0.05. In this case, only the hypotheses corresponding with the first three p-values can be rejected. According to [1]_, these three hypotheses concerned "allergic reaction" and "two different aspects of bleeding." An alternative approach is to control the false discovery rate: the expected fraction of rejected null hypotheses that are actually true. The advantage of this approach is that it typically affords greater power: an increased rate of rejecting the null hypothesis when it is indeed false. To control the false discovery rate at 5%, we apply the Benjamini-Hochberg p-value adjustment. >>> from scipy import stats >>> stats.false_discovery_control(ps) array([0.0015 , 0.003 , 0.0095 , 0.035625 , 0.0603 , 0.06385714, 0.06385714, 0.0645 , 0.0765 , 0.486 , 0.58118182, 0.714875 , 0.75323077, 0.81321429, 1. ]) Now, the first *four* adjusted p-values fall below 0.05, so we would reject the null hypotheses corresponding with these *four* p-values. Rejection of the fourth null hypothesis was particularly important to the original study as it led to the conclusion that the new treatment had a "substantially lower in-hospital mortality rate." r