samples necessary to estimate the given `estimator`. By default a :class:`~sklearn.linear_model.LinearRegression` estimator is assumed and `min_samples` is chosen as ``X.shape[1] + 1``. This parameter is highly dependent upon the model, so if a `estimator` other than :class:`~sklearn.linear_model.LinearRegression` is used, the user must provide a value. residual_threshold : float, default=None Maximum residual for a data sample to be classified as an inlier. By default the threshold is chosen as the MAD (median absolute deviation) of the target values `y`. Points whose residuals are strictly equal to the threshold are considered as inliers. is_data_valid : callable, default=None This function is called with the randomly selected data before the model is fitted to it: `is_data_valid(X, y)`. If its return value is False the current randomly chosen sub-sample is skipped. is_model_valid : callable, default=None This function is called with the estimated model and the randomly selected data: `is_model_valid(model, X, y)`. If its return value is False the current randomly chosen sub-sample is skipped. Rejecting samples with this function is computationally costlier than with `is_data_valid`. `is_model_valid` should therefore only be used if the estimated model is needed for making the rejection decision. max_trials : int, default=100 Maximum number of iterations for random sample selection. max_skips : int, default=np.inf Maximum number of iterations that can be skipped due to finding zero inliers or invalid data defined by ``is_data_valid`` or invalid models defined by ``is_model_valid``. .. versionadded:: 0.19 stop_n_inliers : int, default=np.inf Stop iteration if at least this number of inliers are found. stop_score : float, default=np.inf Stop iteration if score is greater equal than this threshold. stop_probability : float in range [0, 1], default=0.99 RANSAC iteration stops if at least one outlier-free set of the training data is sampled in RANSAC. This requires to generate at least N samples (iterations):: N >= log(1 - probability) / log(1 - e**m) where the probability (confidence) is typically set to high value such as 0.99 (the default) and e is the current fraction of inliers w.r.t. the total number of samples. loss : str, callable, default='absolute_error' String inputs, 'absolute_error' and 'squared_error' are supported which find the absolute error and squared error per sample respectively. If ``loss`` is a callable, then it should be a function that takes two arrays as inputs, the true and predicted value and returns a 1-D array with the i-th value of the array corresponding to the loss on ``X[i]``. If the loss on a sample is greater than the ``residual_threshold``, then this sample is classified as an outlier. .. versionadded:: 0.18 random_state : int, RandomState instance, default=None The generator used to initialize the centers. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `. Attributes ---------- estimator_ : object Final model fitted on the inliers predicted by the "best" model found during RANSAC sampling (copy of the `estimator` object). n_trials_ : int Number of random selection trials until one of the stop criteria is met. It is always ``<= max_trials``. inlier_mask_ : bool array of shape [n_samples] Boolean mask of inliers classified as ``True``. n_skips_no_inliers_ : int Number of iterations skipped due to finding zero inliers. .. versionadded:: 0.19 n_skips_invalid_data_ : int Number of iterations skipped due to invalid data defined by ``is_data_valid``. .. versionadded:: 0.19 n_skips_invalid_model_ : int Number of iterations skipped due to an invalid model defined by ``is_model_valid``. .. versionadded:: 0.19 n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 See Also -------- HuberRegressor : Linear regression model that is robust to outliers. TheilSenRegressor : Theil-Sen Estimator robust multivariate regression model. SGDRegressor : Fitted by minimizing a regularized empirical loss with SGD. References ---------- .. [1] https://en.wikipedia.org/wiki/RANSAC .. [2] https://www.sri.com/wp-content/uploads/2021/12/ransac-publication.pdf .. [3] https://bmva-archive.org.uk/bmvc/2009/Papers/Paper355/Paper355.pdf Examples -------- >>> from sklearn.linear_model import RANSACRegressor >>> from sklearn.datasets import make_regression >>> X, y = make_regression( ... n_samples=200, n_features=2, noise=4.0, random_state=0) >>> reg = RANSACRegressor(random_state=0).fit(X, y) >>> reg.score(X, y) 0.9885... >>> reg.predict(X[:1,]) array([-31.9417...]) For a more detailed example, see :ref:`sphx_glr_auto_examples_linear_model_plot_ransac.py` )