the complete dataset before
    timing out.

alpha_W : float, default=0.0
    Constant that multiplies the regularization terms of `W`. Set it to zero
    (default) to have no regularization on `W`.

alpha_H : float or "same", default="same"
    Constant that multiplies the regularization terms of `H`. Set it to zero to
    have no regularization on `H`. If "same" (default), it takes the same value as
    `alpha_W`.

l1_ratio : float, default=0.0
    The regularization mixing parameter, with 0 <= l1_ratio <= 1.
    For l1_ratio = 0 the penalty is an elementwise L2 penalty
    (aka Frobenius Norm).
    For l1_ratio = 1 it is an elementwise L1 penalty.
    For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

forget_factor : float, default=0.7
    Amount of rescaling of past information. Its value could be 1 with
    finite datasets. Choosing values < 1 is recommended with online
    learning as more recent batches will weight more than past batches.

fresh_restarts : bool, default=False
    Whether to completely solve for W at each step. Doing fresh restarts will likely
    lead to a better solution for a same number of iterations but it is much slower.

fresh_restarts_max_iter : int, default=30
    Maximum number of iterations when solving for W at each step. Only used when
    doing fresh restarts. These iterations may be stopped early based on a small
    change of W controlled by `tol`.

transform_max_iter : int, default=None
    Maximum number of iterations when solving for W at transform time.
    If None, it defaults to `max_iter`.

random_state : int, RandomState instance or None, default=None
    Used for initialisation (when ``init`` == 'nndsvdar' or
    'random'), and in Coordinate Descent. Pass an int for reproducible
    results across multiple function calls.
    See :term:`Glossary <random_state>`.

verbose : bool, default=False
    Whether to be verbose.

Attributes
----------
components_ : ndarray of shape (n_components, n_features)
    Factorization matrix, sometimes called 'dictionary'.

n_components_ : int
    The number of components. It is same as the `n_components` parameter
    if it was given. Otherwise, it will be same as the number of
    features.

reconstruction_err_ : float
    Frobenius norm of the matrix difference, or beta-divergence, between
    the training data `X` and the reconstructed data `WH` from
    the fitted model.

n_iter_ : int
    Actual number of started iterations over the whole dataset.

n_steps_ : int
    Number of mini-batches processed.

n_features_in_ : int
    Number of features seen during :term:`fit`.

feature_names_in_ : ndarray of shape (`n_features_in_`,)
    Names of features seen during :term:`fit`. Defined only when `X`
    has feature names that are all strings.

See Also
--------
NMF : Non-negative matrix factorization.
MiniBatchDictionaryLearning : Finds a dictionary that can best be used to represent
    data using a sparse code.

References
----------
.. [1] :doi:`"Fast local algorithms for large scale nonnegative matrix and tensor
   factorizations" <10.1587/transfun.E92.A.708>`
   Cichocki, Andrzej, and P. H. A. N. Anh-Huy. IEICE transactions on fundamentals
   of electronics, communications and computer sciences 92.3: 708-721, 2009.

.. [2] :doi:`"Algorithms for nonnegative matrix factorization with the
   beta-divergence" <10.1162/NECO_a_00168>`
   Fevotte, C., & Idier, J. (2011). Neural Computation, 23(9).

.. [3] :doi:`"Online algorithms for nonnegative matrix factorization with the
   Itakura-Saito divergence" <10.1109/ASPAA.2011.6082314>`
   Lefevre, A., Bach, F., Fevotte, C. (2011). WASPA.

Examples
--------
>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import MiniBatchNMF
>>> model = MiniBatchNMF(n_components=2, init='random', random_state=0)
>>> W = model.fit_transform(X)
>>> H = model.components_
r