random.

    - `'random'`: non-negative random matrices, scaled with:
      `sqrt(X.mean() / n_components)`

    - `'nndsvd'`: Nonnegative Double Singular Value Decomposition (NNDSVD)
      initialization (better for sparseness)

    - `'nndsvda'`: NNDSVD with zeros filled with the average of X
      (better when sparsity is not desired)

    - `'nndsvdar'` NNDSVD with zeros filled with small random values
      (generally faster, less accurate alternative to NNDSVDa
      for when sparsity is not desired)

    - `'custom'`: Use custom matrices `W` and `H` which must both be provided.

    .. versionchanged:: 1.1
        When `init=None` and n_components is less than n_samples and n_features
        defaults to `nndsvda` instead of `nndsvd`.

solver : {'cd', 'mu'}, default='cd'
    Numerical solver to use:

    - 'cd' is a Coordinate Descent solver.
    - 'mu' is a Multiplicative Update solver.

    .. versionadded:: 0.17
       Coordinate Descent solver.

    .. versionadded:: 0.19
       Multiplicative Update solver.

beta_loss : float or {'frobenius', 'kullback-leibler',             'itakura-saito'}, default='frobenius'
    Beta divergence to be minimized, measuring the distance between X
    and the dot product WH. Note that values different from 'frobenius'
    (or 2) and 'kullback-leibler' (or 1) lead to significantly slower
    fits. Note that for beta_loss <= 0 (or 'itakura-saito'), the input
    matrix X cannot contain zeros. Used only in 'mu' solver.

    .. versionadded:: 0.19

tol : float, default=1e-4
    Tolerance of the stopping condition.

max_iter : int, default=200
    Maximum number of iterations before timing out.

random_state : int, RandomState instance or None, default=None
    Used for initialisation (when ``init`` == 'nndsvdar' or
    'random'), and in Coordinate Descent. Pass an int for reproducible
    results across multiple function calls.
    See :term:`Glossary <random_state>`.

alpha_W : float, default=0.0
    Constant that multiplies the regularization terms of `W`. Set it to zero
    (default) to have no regularization on `W`.

    .. versionadded:: 1.0

alpha_H : float or "same", default="same"
    Constant that multiplies the regularization terms of `H`. Set it to zero to
    have no regularization on `H`. If "same" (default), it takes the same value as
    `alpha_W`.

    .. versionadded:: 1.0

l1_ratio : float, default=0.0
    The regularization mixing parameter, with 0 <= l1_ratio <= 1.
    For l1_ratio = 0 the penalty is an elementwise L2 penalty
    (aka Frobenius Norm).
    For l1_ratio = 1 it is an elementwise L1 penalty.
    For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

    .. versionadded:: 0.17
       Regularization parameter *l1_ratio* used in the Coordinate Descent
       solver.

verbose : int, default=0
    Whether to be verbose.

shuffle : bool, default=False
    If true, randomize the order of coordinates in the CD solver.

    .. versionadded:: 0.17
       *shuffle* parameter used in the Coordinate Descent solver.

Attributes
----------
components_ : ndarray of shape (n_components, n_features)
    Factorization matrix, sometimes called 'dictionary'.

n_components_ : int
    The number of components. It is same as the `n_components` parameter
    if it was given. Otherwise, it will be same as the number of
    features.

reconstruction_err_ : float
    Frobenius norm of the matrix difference, or beta-divergence, between
    the training data ``X`` and the reconstructed data ``WH`` from
    the fitted model.

n_iter_ : int
    Actual number of iterations.

n_features_in_ : int
    Number of features seen during :term:`fit`.

    .. versionadded:: 0.24

feature_names_in_ : ndarray of shape (`n_features_in_`,)
    Names of features seen during :term:`fit`. Defined only when `X`
    has feature names that are all strings.

    .. versionadded:: 1.0

See Also
--------
DictionaryLearning : Find a dictionary that sparsely encodes data.
MiniBatchSparsePCA : Mini-batch Sparse Principal Components Analysis.
PCA : Principal component analysis.
SparseCoder : Find a sparse representation of data from a fixed,
    precomputed dictionary.
SparsePCA : Sparse Principal Components Analysis.
TruncatedSVD : Dimensionality reduction using truncated SVD.

References
----------
.. [1] :doi:`"Fast local algorithms for large scale nonnegative matrix and tensor
   factorizations" <10.1587/transfun.E92.A.708>`
   Cichocki, Andrzej, and P. H. A. N. Anh-Huy. IEICE transactions on fundamentals
   of electronics, communications and computer sciences 92.3: 708-721, 2009.

.. [2] :doi:`"Algorithms for nonnegative matrix factorization with the
   beta-divergence" <10.1162/NECO_a_00168>`
   Fevotte, C., & Idier, J. (2011). Neural Computation, 23(9).

Examples
--------
>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import NMF
>>> model = NMF(n_components=2, init='random', random_state=0)
>>> W = model.fit_transform(X)
>>> H = model.components_
r