(\frac{1}{c_1} + \frac{1}{c_2}) } } with :math:`\hat{p}_1, \hat{p}_2` and :math:`\hat{p}` the estimator of :math:`p_1, p_2` and :math:`p`, the latter being the combined probability, given the assumption that :math:`p_1 = p_2`. If this assumption is invalid (``pooled = False``), the statistic is: .. math:: T(X) = \frac{ \hat{p}_1 - \hat{p}_2 }{ \sqrt{ \frac{\hat{p}_1 (1 - \hat{p}_1)}{c_1} + \frac{\hat{p}_2 (1 - \hat{p}_2)}{c_2} } } The p-value is then computed as: .. math:: \sum \binom{c_1}{x_{11}} \binom{c_2}{x_{12}} \pi^{x_{11} + x_{12}} (1 - \pi)^{t - x_{11} - x_{12}} where the sum is over all 2x2 contingency tables :math:`X` such that: * :math:`T(X) \leq T(X_0)` when `alternative` = "less", * :math:`T(X) \geq T(X_0)` when `alternative` = "greater", or * :math:`T(X) \geq |T(X_0)|` when `alternative` = "two-sided". Above, :math:`c_1, c_2` are the sum of the columns 1 and 2, and :math:`t` the total (sum of the 4 sample's element). The returned p-value is the maximum p-value taken over the nuisance parameter :math:`\pi`, where :math:`0 \leq \pi \leq 1`. This function's complexity is :math:`O(n c_1 c_2)`, where `n` is the number of sample points. References ---------- .. [1] Barnard, G. A. "Significance Tests for 2x2 Tables". *Biometrika*. 34.1/2 (1947): 123-138. :doi:`dpgkg3` .. [2] Mehta, Cyrus R., and Pralay Senchaudhuri. "Conditional versus unconditional exact tests for comparing two binomials." *Cytel Software Corporation* 675 (2003): 1-5. .. [3] "Wald Test". *Wikipedia*. https://en.wikipedia.org/wiki/Wald_test Examples -------- An example use of Barnard's test is presented in [2]_. Consider the following example of a vaccine efficacy study (Chan, 1998). In a randomized clinical trial of 30 subjects, 15 were inoculated with a recombinant DNA influenza vaccine and the 15 were inoculated with a placebo. Twelve of the 15 subjects in the placebo group (80%) eventually became infected with influenza whereas for the vaccine group, only 7 of the 15 subjects (47%) became infected. The data are tabulated as a 2 x 2 table:: Vaccine Placebo Yes 7 12 No 8 3 When working with statistical hypothesis testing, we usually use a threshold probability or significance level upon which we decide to reject the null hypothesis :math:`H_0`. Suppose we choose the common significance level of 5%. Our alternative hypothesis is that the vaccine will lower the chance of becoming infected with the virus; that is, the probability :math:`p_1` of catching the virus with the vaccine will be *less than* the probability :math:`p_2` of catching the virus without the vaccine. Therefore, we call `barnard_exact` with the ``alternative="less"`` option: >>> import scipy.stats as stats >>> res = stats.barnard_exact([[7, 12], [8, 3]], alternative="less") >>> res.statistic -1.894 >>> res.pvalue 0.03407 Under the null hypothesis that the vaccine will not lower the chance of becoming infected, the probability of obtaining test results at least as extreme as the observed data is approximately 3.4%. Since this p-value is less than our chosen significance level, we have evidence to reject :math:`H_0` in favor of the alternative. Suppose we had used Fisher's exact test instead: >>> _, pvalue = stats.fisher_exact([[7, 12], [8, 3]], alternative="less") >>> pvalue 0.0640 With the same threshold significance of 5%, we would not have been able to reject the null hypothesis in favor of the alternative. As stated in [2]_, Barnard's test is uniformly more powerful than Fisher's exact test because Barnard's test does not condition on any margin. Fisher's test should only be used when both sets of marginals are fixed. r