attribute that is incremented in each ``forward``, it will always stay at the initial value because the update is done on the replicas which are destroyed after ``forward``. However, :class:`~torch.nn.DataParallel` guarantees that the replica on ``device[0]`` will have its parameters and buffers sharing storage with the base parallelized :attr:`module`. So **in-place** updates to the parameters or buffers on ``device[0]`` will be recorded. E.g., :class:`~torch.nn.BatchNorm2d` and :func:`~torch.nn.utils.spectral_norm` rely on this behavior to update the buffers. .. warning:: Forward and backward hooks defined on :attr:`module` and its submodules will be invoked ``len(device_ids)`` times, each with inputs located on a particular device. Particularly, the hooks are only guaranteed to be executed in correct order with respect to operations on corresponding devices. For example, it is not guaranteed that hooks set via :meth:`~torch.nn.Module.register_forward_pre_hook` be executed before `all` ``len(device_ids)`` :meth:`~torch.nn.Module.forward` calls, but that each such hook be executed before the corresponding :meth:`~torch.nn.Module.forward` call of that device. .. warning:: When :attr:`module` returns a scalar (i.e., 0-dimensional tensor) in :func:`forward`, this wrapper will return a vector of length equal to number of devices used in data parallelism, containing the result from each device. .. note:: There is a subtlety in using the ``pack sequence -> recurrent network -> unpack sequence`` pattern in a :class:`~torch.nn.Module` wrapped in :class:`~torch.nn.DataParallel`. See :ref:`pack-rnn-unpack-with-data-parallelism` section in FAQ for details. Args: module (Module): module to be parallelized device_ids (list of int or torch.device): CUDA devices (default: all devices) output_device (int or torch.device): device location of output (default: device_ids[0]) Attributes: module (Module): the module to be parallelized Example:: >>> # xdoctest: +SKIP >>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2]) >>> output = net(input_var) # input_var can be on any device, including CPU Nr