e distributions from which the samples are drawn have the same finite variance. The original formulation of the test was for samples of equal size [6]_. In case of unequal sample sizes, the test uses the Tukey-Kramer method [4]_. References ---------- .. [1] NIST/SEMATECH e-Handbook of Statistical Methods, "7.4.7.1. Tukey's Method." https://www.itl.nist.gov/div898/handbook/prc/section4/prc471.htm, 28 November 2020. .. [2] Abdi, Herve & Williams, Lynne. (2021). "Tukey's Honestly Significant Difference (HSD) Test." https://personal.utdallas.edu/~herve/abdi-HSD2010-pretty.pdf .. [3] "One-Way ANOVA Using SAS PROC ANOVA & PROC GLM." SAS Tutorials, 2007, www.stattutorials.com/SAS/TUTORIAL-PROC-GLM.htm. .. [4] Kramer, Clyde Young. "Extension of Multiple Range Tests to Group Means with Unequal Numbers of Replications." Biometrics, vol. 12, no. 3, 1956, pp. 307-310. JSTOR, www.jstor.org/stable/3001469. Accessed 25 May 2021. .. [5] NIST/SEMATECH e-Handbook of Statistical Methods, "7.4.3.3. The ANOVA table and tests of hypotheses about means" https://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm, 2 June 2021. .. [6] Tukey, John W. "Comparing Individual Means in the Analysis of Variance." Biometrics, vol. 5, no. 2, 1949, pp. 99-114. JSTOR, www.jstor.org/stable/3001913. Accessed 14 June 2021. Examples -------- Here are some data comparing the time to relief of three brands of headache medicine, reported in minutes. Data adapted from [3]_. >>> import numpy as np >>> from scipy.stats import tukey_hsd >>> group0 = [24.5, 23.5, 26.4, 27.1, 29.9] >>> group1 = [28.4, 34.2, 29.5, 32.2, 30.1] >>> group2 = [26.1, 28.3, 24.3, 26.2, 27.8] We would like to see if the means between any of the groups are significantly different. First, visually examine a box and whisker plot. >>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots(1, 1) >>> ax.boxplot([group0, group1, group2]) >>> ax.set_xticklabels(["group0", "group1", "group2"]) # doctest: +SKIP >>> ax.set_ylabel("mean") # doctest: +SKIP >>> plt.show() From the box and whisker plot, we can see overlap in the interquartile ranges group 1 to group 2 and group 3, but we can apply the ``tukey_hsd`` test to determine if the difference between means is significant. We set a significance level of .05 to reject the null hypothesis. >>> res = tukey_hsd(group0, group1, group2) >>> print(res) Tukey's HSD Pairwise Group Comparisons (95.0% Confidence Interval) Comparison Statistic p-value Lower CI Upper CI (0 - 1) -4.600 0.014 -8.249 -0.951 (0 - 2) -0.260 0.980 -3.909 3.389 (1 - 0) 4.600 0.014 0.951 8.249 (1 - 2) 4.340 0.020 0.691 7.989 (2 - 0) 0.260 0.980 -3.389 3.909 (2 - 1) -4.340 0.020 -7.989 -0.691 The null hypothesis is that each group has the same mean. The p-value for comparisons between ``group0`` and ``group1`` as well as ``group1`` and ``group2`` do not exceed .05, so we reject the null hypothesis that they have the same means. The p-value of the comparison between ``group0`` and ``group2`` exceeds .05, so we accept the null hypothesis that there is not a significant difference between their means. We can also compute the confidence interval associated with our chosen confidence level. >>> group0 = [24.5, 23.5, 26.4, 27.1, 29.9] >>> group1 = [28.4, 34.2, 29.5, 32.2, 30.1] >>> group2 = [26.1, 28.3, 24.3, 26.2, 27.8] >>> result = tukey_hsd(group0, group1, group2) >>> conf = res.confidence_interval(confidence_level=.99) >>> for ((i, j), l) in np.ndenumerate(conf.low): ... # filter out self comparisons ... if i != j: ... h = conf.high[i,j] ... print(f"({i} - {j}) {l:>6.3f} {h:>6.3f}") (0 - 1) -9.480 0.280 (0 - 2) -5.140 4.620 (1 - 0) -0.280 9.480 (1 - 2) -0.540 9.220 (2 - 0) -4.620 5.140 (2 - 1) -9.220 0.540 r