- ``j = (q*n + m - 1) // 1``, and - ``g = (q*n + m - 1) % 1``, where ``m`` may be defined according to several different conventions. The preferred convention may be selected using the ``method`` parameter: =============================== =============== =============== ``method`` number in H&F ``m`` =============================== =============== =============== ``interpolated_inverted_cdf`` 4 ``0`` ``hazen`` 5 ``1/2`` ``weibull`` 6 ``q`` ``linear`` (default) 7 ``1 - q`` ``median_unbiased`` 8 ``q/3 + 1/3`` ``normal_unbiased`` 9 ``q/4 + 3/8`` =============================== =============== =============== Note that indices ``j`` and ``j + 1`` are clipped to the range ``0`` to ``n - 1`` when the results of the formula would be outside the allowed range of non-negative indices. The ``- 1`` in the formulas for ``j`` and ``g`` accounts for Python's 0-based indexing. The table above includes only the estimators from H&F that are continuous functions of probability `q` (estimators 4-9). NumPy also provides the three discontinuous estimators from H&F (estimators 1-3), where ``j`` is defined as above, ``m`` is defined as follows, and ``g`` is a function of the real-valued ``index = q*n + m - 1`` and ``j``. 1. ``inverted_cdf``: ``m = 0`` and ``g = int(index - j > 0)`` 2. ``averaged_inverted_cdf``: ``m = 0`` and ``g = (1 + int(index - j > 0)) / 2`` 3. ``closest_observation``: ``m = -1/2`` and ``g = 1 - int((index == j) & (j%2 == 1))`` For backward compatibility with previous versions of NumPy, `quantile` provides four additional discontinuous estimators. Like ``method='linear'``, all have ``m = 1 - q`` so that ``j = q*(n-1) // 1``, but ``g`` is defined as follows. - ``lower``: ``g = 0`` - ``midpoint``: ``g = 0.5`` - ``higher``: ``g = 1`` - ``nearest``: ``g = (q*(n-1) % 1) > 0.5`` **Weighted quantiles:** More formally, the quantile at probability level :math:`q` of a cumulative distribution function :math:`F(y)=P(Y \leq y)` with probability measure :math:`P` is defined as any number :math:`x` that fulfills the *coverage conditions* .. math:: P(Y < x) \leq q \quad\text{and}\quad P(Y \leq x) \geq q with random variable :math:`Y\sim P`. Sample quantiles, the result of `quantile`, provide nonparametric estimation of the underlying population counterparts, represented by the unknown :math:`F`, given a data vector `a` of length ``n``. Some of the estimators above arise when one considers :math:`F` as the empirical distribution function of the data, i.e. :math:`F(y) = \frac{1}{n} \sum_i 1_{a_i \leq y}`. Then, different methods correspond to different choices of :math:`x` that fulfill the above coverage conditions. Methods that follow this approach are ``inverted_cdf`` and ``averaged_inverted_cdf``. For weighted quantiles, the coverage conditions still hold. The empirical cumulative distribution is simply replaced by its weighted version, i.e. :math:`P(Y \leq t) = \frac{1}{\sum_i w_i} \sum_i w_i 1_{x_i \leq t}`. Only ``method="inverted_cdf"`` supports weights. Examples -------- >>> import numpy as np >>> a = np.array([[10, 7, 4], [3, 2, 1]]) >>> a array([[10, 7, 4], [ 3, 2, 1]]) >>> np.quantile(a, 0.5) 3.5 >>> np.quantile(a, 0.5, axis=0) array([6.5, 4.5, 2.5]) >>> np.quantile(a, 0.5, axis=1) array([7., 2.]) >>> np.quantile(a, 0.5, axis=1, keepdims=True) array([[7.], [2.]]) >>> m = np.quantile(a, 0.5, axis=0) >>> out = np.zeros_like(m) >>> np.quantile(a, 0.5, axis=0, out=out) array([6.5, 4.5, 2.5]) >>> m array([6.5, 4.5, 2.5]) >>> b = a.copy() >>> np.quantile(b, 0.5, axis=1, overwrite_input=True) array([7., 2.]) >>> assert not np.all(a == b) See also `numpy.percentile` for a visualization of most methods. References ---------- .. [1] R. J. Hyndman and Y. Fan, "Sample quantiles in statistical packages," The American Statistician, 50(4), pp. 361-365, 1996 rY