plied if `sample_posterior=False`. tol : float, default=1e-3 Tolerance of the stopping condition. n_nearest_features : int, default=None Number of other features to use to estimate the missing values of each feature column. Nearness between features is measured using the absolute correlation coefficient between each feature pair (after initial imputation). To ensure coverage of features throughout the imputation process, the neighbor features are not necessarily nearest, but are drawn with probability proportional to correlation for each imputed target feature. Can provide significant speed-up when the number of features is huge. If `None`, all features will be used. initial_strategy : {'mean', 'median', 'most_frequent', 'constant'}, default='mean' Which strategy to use to initialize the missing values. Same as the `strategy` parameter in :class:`~sklearn.impute.SimpleImputer`. fill_value : str or numerical value, default=None When `strategy="constant"`, `fill_value` is used to replace all occurrences of missing_values. For string or object data types, `fill_value` must be a string. If `None`, `fill_value` will be 0 when imputing numerical data and "missing_value" for strings or object data types. .. versionadded:: 1.3 imputation_order : {'ascending', 'descending', 'roman', 'arabic', 'random'}, default='ascending' The order in which the features will be imputed. Possible values: - `'ascending'`: From features with fewest missing values to most. - `'descending'`: From features with most missing values to fewest. - `'roman'`: Left to right. - `'arabic'`: Right to left. - `'random'`: A random order for each round. skip_complete : bool, default=False If `True` then features with missing values during :meth:`transform` which did not have any missing values during :meth:`fit` will be imputed with the initial imputation method only. Set to `True` if you have many features with no missing values at both :meth:`fit` and :meth:`transform` time to save compute. min_value : float or array-like of shape (n_features,), default=-np.inf Minimum possible imputed value. Broadcast to shape `(n_features,)` if scalar. If array-like, expects shape `(n_features,)`, one min value for each feature. The default is `-np.inf`. .. versionchanged:: 0.23 Added support for array-like. max_value : float or array-like of shape (n_features,), default=np.inf Maximum possible imputed value. Broadcast to shape `(n_features,)` if scalar. If array-like, expects shape `(n_features,)`, one max value for each feature. The default is `np.inf`. .. versionchanged:: 0.23 Added support for array-like. verbose : int, default=0 Verbosity flag, controls the debug messages that are issued as functions are evaluated. The higher, the more verbose. Can be 0, 1, or 2. random_state : int, RandomState instance or None, default=None The seed of the pseudo random number generator to use. Randomizes selection of estimator features if `n_nearest_features` is not `None`, the `imputation_order` if `random`, and the sampling from posterior if `sample_posterior=True`. Use an integer for determinism. See :term:`the Glossary `. add_indicator : bool, default=False If `True`, a :class:`MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time. keep_empty_features : bool, default=False If True, features that consist exclusively of missing values when `fit` is called are returned in results when `transform` is called. The imputed value is always `0` except when `initial_strategy="constant"` in which case `fill_value` will be used instead. .. versionadded:: 1.2 Attributes ---------- initial_imputer_ : object of type :class:`~sklearn.impute.SimpleImputer` Imputer used to initialize the missing values. imputation_sequence_ : list of tuples Each tuple has `(feat_idx, neighbor_feat_idx, estimator)`, where `feat_idx` is the current feature to be imputed, `neighbor_feat_idx` is the array of other features used to impute the current feature, and `estimator` is the trained estimator used for the imputation. Length is `self.n_features_with_missing_ * self.n_iter_`. n_iter_ : int Number of iteration rounds that occurred. Will be less than `self.max_iter` if early stopping criterion was reached. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 n_features_with_missing_ : int Number of features with missing values. indicator_ : :class:`~sklearn.impute.MissingIndicator` Indicator used to add binary indicators for missing values. `None` if `add_indicator=False`. random_state_ : RandomState instance RandomState instance that is generated either from a seed, the random number generator or by `np.random`. See Also -------- SimpleImputer : Univariate imputer for completing missing values with simple strategies. KNNImputer : Multivariate imputer that estimates missing features using nearest samples. Notes ----- To support imputation in inductive mode we store each feature's estimator during the :meth:`fit` phase, and predict without refitting (in order) during the :meth:`transform` phase. Features which contain all missing values at :meth:`fit` are discarded upon :meth:`transform`. Using defaults, the imputer scales in :math:`\mathcal{O}(knp^3\min(n,p))` where :math:`k` = `max_iter`, :math:`n` the number of samples and :math:`p` the number of features. It thus becomes prohibitively costly when the number of features increases. Setting `n_nearest_features << n_features`, `skip_complete=True` or increasing `tol` can help to reduce its computational cost. Depending on the nature of missing values, simple imputers can be preferable in a prediction context. References ---------- .. [1] `Stef van Buuren, Karin Groothuis-Oudshoorn (2011). "mice: Multivariate Imputation by Chained Equations in R". Journal of Statistical Software 45: 1-67. `_ .. [2] `S. F. Buck, (1960). "A Method of Estimation of Missing Values in Multivariate Data Suitable for use with an Electronic Computer". Journal of the Royal Statistical Society 22(2): 302-306. `_ Examples -------- >>> import numpy as np >>> from sklearn.experimental import enable_iterative_imputer >>> from sklearn.impute import IterativeImputer >>> imp_mean = IterativeImputer(random_state=0) >>> imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]]) IterativeImputer(random_state=0) >>> X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]] >>> imp_mean.transform(X) array([[ 6.9584..., 2. , 3. ], [ 4. , 2.6000..., 6. ], [10. , 4.9999..., 9. ]]) For a more detailed example see :ref:`sphx_glr_auto_examples_impute_plot_missing_values.py` or :ref:`sphx_glr_auto_examples_impute_plot_iterative_imputer_variants_comparison.py`. NÚ