t, and 'saga' uses
      its improved, unbiased version named SAGA. Both methods also use an
      iterative procedure, and are often faster than other solvers when
      both n_samples and n_features are large. Note that 'sag' and
      'saga' fast convergence is only guaranteed on features with
      approximately the same scale. You can preprocess the data with a
      scaler from sklearn.preprocessing.

    - 'lbfgs' uses L-BFGS-B algorithm implemented in
      `scipy.optimize.minimize`. It can be used only when `positive`
      is True.

    All solvers except 'svd' support both dense and sparse data. However, only
    'lsqr', 'sag', 'sparse_cg', and 'lbfgs' support sparse input when
    `fit_intercept` is True.

    .. versionadded:: 0.17
       Stochastic Average Gradient descent solver.
    .. versionadded:: 0.19
       SAGA solver.

positive : bool, default=False
    When set to ``True``, forces the coefficients to be positive.
    Only 'lbfgs' solver is supported in this case.

random_state : int, RandomState instance, default=None
    Used when ``solver`` == 'sag' or 'saga' to shuffle the data.
    See :term:`Glossary <random_state>` for details.

    .. versionadded:: 0.17
       `random_state` to support Stochastic Average Gradient.

Attributes
----------
coef_ : ndarray of shape (n_features,) or (n_targets, n_features)
    Weight vector(s).

intercept_ : float or ndarray of shape (n_targets,)
    Independent term in decision function. Set to 0.0 if
    ``fit_intercept = False``.

n_iter_ : None or ndarray of shape (n_targets,)
    Actual number of iterations for each target. Available only for
    sag and lsqr solvers. Other solvers will return None.

    .. versionadded:: 0.17

n_features_in_ : int
    Number of features seen during :term:`fit`.

    .. versionadded:: 0.24

feature_names_in_ : ndarray of shape (`n_features_in_`,)
    Names of features seen during :term:`fit`. Defined only when `X`
    has feature names that are all strings.

    .. versionadded:: 1.0

solver_ : str
    The solver that was used at fit time by the computational
    routines.

    .. versionadded:: 1.5

See Also
--------
RidgeClassifier : Ridge classifier.
RidgeCV : Ridge regression with built-in cross validation.
:class:`~sklearn.kernel_ridge.KernelRidge` : Kernel ridge regression
    combines ridge regression with the kernel trick.

Notes
-----
Regularization improves the conditioning of the problem and
reduces the variance of the estimates. Larger values specify stronger
regularization. Alpha corresponds to ``1 / (2C)`` in other linear
models such as :class:`~sklearn.linear_model.LogisticRegression` or
:class:`~sklearn.svm.LinearSVC`.

Examples
--------
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge()
TNrM