rallel. This evaluation is carried out as ``workers(func, iterable)``. Requires that `func` be pickleable. The default is ``1``. is_twosamp : bool, optional If `True`, a two sample test will be run. If ``x`` and ``y`` have shapes ``(n, p)`` and ``(m, p)``, this optional will be overridden and set to ``True``. Set to ``True`` if ``x`` and ``y`` both have shapes ``(n, p)`` and a two sample test is desired. The default is ``False``. Note that this will not run if inputs are distance matrices. random_state : {None, int, `numpy.random.Generator`, `numpy.random.RandomState`}, optional If `seed` is None (or `np.random`), the `numpy.random.RandomState` singleton is used. If `seed` is an int, a new ``RandomState`` instance is used, seeded with `seed`. If `seed` is already a ``Generator`` or ``RandomState`` instance then that instance is used. Returns ------- res : MGCResult An object containing attributes: statistic : float The sample MGC test statistic within ``[-1, 1]``. pvalue : float The p-value obtained via permutation. mgc_dict : dict Contains additional useful results: - mgc_map : ndarray A 2D representation of the latent geometry of the relationship. - opt_scale : (int, int) The estimated optimal scale as a ``(x, y)`` pair. - null_dist : list The null distribution derived from the permuted matrices. See Also -------- pearsonr : Pearson correlation coefficient and p-value for testing non-correlation. kendalltau : Calculates Kendall's tau. spearmanr : Calculates a Spearman rank-order correlation coefficient. Notes ----- A description of the process of MGC and applications on neuroscience data can be found in [1]_. It is performed using the following steps: #. Two distance matrices :math:`D^X` and :math:`D^Y` are computed and modified to be mean zero columnwise. This results in two :math:`n \times n` distance matrices :math:`A` and :math:`B` (the centering and unbiased modification) [3]_. #. For all values :math:`k` and :math:`l` from :math:`1, ..., n`, * The :math:`k`-nearest neighbor and :math:`l`-nearest neighbor graphs are calculated for each property. Here, :math:`G_k (i, j)` indicates the :math:`k`-smallest values of the :math:`i`-th row of :math:`A` and :math:`H_l (i, j)` indicates the :math:`l` smallested values of the :math:`i`-th row of :math:`B` * Let :math:`\circ` denotes the entry-wise matrix product, then local correlations are summed and normalized using the following statistic: .. math:: c^{kl} = \frac{\sum_{ij} A G_k B H_l} {\sqrt{\sum_{ij} A^2 G_k \times \sum_{ij} B^2 H_l}} #. The MGC test statistic is the smoothed optimal local correlation of :math:`\{ c^{kl} \}`. Denote the smoothing operation as :math:`R(\cdot)` (which essentially set all isolated large correlations) as 0 and connected large correlations the same as before, see [3]_.) MGC is, .. math:: MGC_n (x, y) = \max_{(k, l)} R \left(c^{kl} \left( x_n, y_n \right) \right) The test statistic returns a value between :math:`(-1, 1)` since it is normalized. The p-value returned is calculated using a permutation test. This process is completed by first randomly permuting :math:`y` to estimate the null distribution and then calculating the probability of observing a test statistic, under the null, at least as extreme as the observed test statistic. MGC requires at least 5 samples to run with reliable results. It can also handle high-dimensional data sets. In addition, by manipulating the input data matrices, the two-sample testing problem can be reduced to the independence testing problem [4]_. Given sample data :math:`U` and :math:`V` of sizes :math:`p \times n` :math:`p \times m`, data matrix :math:`X` and :math:`Y` can be created as follows: .. math:: X = [U | V] \in \mathcal{R}^{p \times (n + m)} Y = [0_{1 \times n} | 1_{1 \times m}] \in \mathcal{R}^{(n + m)} Then, the MGC statistic can be calculated as normal. This methodology can be extended to similar tests such as distance correlation [4]_. .. versionadded:: 1.4.0 References ---------- .. [1] Vogelstein, J. T., Bridgeford, E. W., Wang, Q., Priebe, C. E., Maggioni, M., & Shen, C. (2019). Discovering and deciphering relationships across disparate data modalities. ELife. .. [2] Panda, S., Palaniappan, S., Xiong, J., Swaminathan, A., Ramachandran, S., Bridgeford, E. W., ... Vogelstein, J. T. (2019). mgcpy: A Comprehensive High Dimensional Independence Testing Python Package. :arXiv:`1907.02088` .. [3] Shen, C., Priebe, C.E., & Vogelstein, J. T. (2019). From distance correlation to multiscale graph correlation. Journal of the American Statistical Association. .. [4] Shen, C. & Vogelstein, J. T. (2018). The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing. :arXiv:`1806.05514` Examples -------- >>> import numpy as np >>> from scipy.stats import multiscale_graphcorr >>> x = np.arange(100) >>> y = x >>> res = multiscale_graphcorr(x, y) >>> res.statistic, res.pvalue (1.0, 0.001) To run an unpaired two-sample test, >>> x = np.arange(100) >>> y = np.arange(79) >>> res = multiscale_graphcorr(x, y) >>> res.statistic, res.pvalue # doctest: +SKIP (0.033258146255703246, 0.023) or, if shape of the inputs are the same, >>> x = np.arange(100) >>> y = x >>> res = multiscale_graphcorr(x, y, is_twosamp=True) >>> res.statistic, res.pvalue # doctest: +SKIP (-0.008021809890200488, 1.0) z