or not the accelerator should split the batches yielded by the dataloaders across the devices. If `True` the actual batch size used will be the same on any kind of distributed processes, but it must be a round multiple of the `num_processes` you are using. If `False`, actual batch size used will be the one set in your script multiplied by the number of processes. mixed_precision (`str`, *optional*): Whether or not to use mixed precision training. Choose from 'no','fp16','bf16 or 'fp8'. Will default to the value in the environment variable `ACCELERATE_MIXED_PRECISION`, which will use the default value in the accelerate config of the current system or the flag passed with the `accelerate.launch` command. 'fp16' requires pytorch 1.6 or higher. 'bf16' requires pytorch 1.10 or higher. 'fp8' requires the installation of transformers-engine. gradient_accumulation_steps (`int`, *optional*, default to 1): The number of steps that should pass before gradients are accumulated. A number > 1 should be combined with `Accelerator.accumulate`. If not passed, will default to the value in the environment variable `ACCELERATE_GRADIENT_ACCUMULATION_STEPS`. Can also be configured through a `GradientAccumulationPlugin`. cpu (`bool`, *optional*): Whether or not to force the script to execute on CPU. Will ignore GPU available if set to `True` and force the execution on one process only. deepspeed_plugin (`DeepSpeedPlugin`, *optional*): Tweak your DeepSpeed related args using this argument. This argument is optional and can be configured directly using *accelerate config* fsdp_plugin (`FullyShardedDataParallelPlugin`, *optional*): Tweak your FSDP related args using this argument. This argument is optional and can be configured directly using *accelerate config* megatron_lm_plugin (`MegatronLMPlugin`, *optional*): Tweak your MegatronLM related args using this argument. This argument is optional and can be configured directly using *accelerate config* rng_types (list of `str` or [`~utils.RNGType`]): The list of random number generators to synchronize at the beginning of each iteration in your prepared dataloaders. Should be one or several of: - `"torch"`: the base torch random number generator - `"cuda"`: the CUDA random number generator (GPU only) - `"xla"`: the XLA random number generator (TPU only) - `"generator"`: the `torch.Generator` of the sampler (or batch sampler if there is no sampler in your dataloader) or of the iterable dataset (if it exists) if the underlying dataset is of that type. Will default to `["torch"]` for PyTorch versions <=1.5.1 and `["generator"]` for PyTorch versions >= 1.6. log_with (list of `str`, [`~utils.LoggerType`] or [`~tracking.GeneralTracker`], *optional*): A list of loggers to be setup for experiment tracking. Should be one or several of: - `"all"` - `"tensorboard"` - `"wandb"` - `"comet_ml"` If `"all"` is selected, will pick up all available trackers in the environment and initialize them. Can also accept implementations of `GeneralTracker` for custom trackers, and can be combined with `"all"`. project_config (`ProjectConfiguration`, *optional*): A configuration for how saving the state can be handled. project_dir (`str`, `os.PathLike`, *optional*): A path to a directory for storing data such as logs of locally-compatible loggers and potentially saved checkpoints. dispatch_batches (`bool`, *optional*): If set to `True`, the dataloader prepared by the Accelerator is only iterated through on the main process and then the batches are split and broadcast to each process. Will default to `True` for `DataLoader` whose underlying dataset is an `IterableDataset`, `False` otherwise. even_batches (`bool`, *optional*, defaults to `True`): If set to `True`, in cases where the total batch size across all processes does not exactly divide the dataset, samples at the start of the dataset will be duplicated so the batch can be divided equally among all workers. step_scheduler_with_optimizer (`bool`, *optional`, defaults to `True`): Set `True` if the learning rate scheduler is stepped at the same time as the optimizer, `False` if only done under certain circumstances (at the end of each epoch, for instance). kwargs_handlers (`list[KwargHandler]`, *optional*) A list of `KwargHandler` to customize how the objects related to distributed training or mixed precision are created. See [kwargs](kwargs) for more information. dynamo_backend (`str` or `DynamoBackend`, *optional*, defaults to `"no"`): Set to one of the possible dynamo backends to optimize your training with torch dynamo. gradient_accumulation_plugin (`GradientAccumulationPlugin`, *optional*): A configuration for how gradient accumulation should be handled, if more tweaking than just the `gradient_accumulation_steps` is needed. **Available attributes:** - **device** (`torch.device`) -- The device to use. - **distributed_type** ([`~utils.DistributedType`]) -- The distributed training configuration. - **local_process_index** (`int`) -- The process index on the current machine. - **mixed_precision** (`str`) -- The configured mixed precision mode. - **num_processes** (`int`) -- The total number of processes used for training. - **optimizer_step_was_skipped** (`bool`) -- Whether or not the optimizer update was skipped (because of gradient overflow in mixed precision), in which case the learning rate should not be changed. - **process_index** (`int`) -- The overall index of the current process among all processes. - **state** ([`~state.AcceleratorState`]) -- The distributed setup state. - **sync_gradients** (`bool`) -- Whether the gradients are currently being synced across all processes. - **use_distributed** (`bool`) -- Whether the current configuration is for distributed training. TFNr