ult, this layer uses instance statistics computed from input data in both training and evaluation modes. If :attr:`track_running_stats` is set to ``True``, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default :attr:`momentum` of 0.1. .. note:: This :attr:`momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is :math:`\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t`, where :math:`\hat{x}` is the estimated statistic and :math:`x_t` is the new observed value. .. note:: :class:`InstanceNorm1d` and :class:`LayerNorm` are very similar, but have some subtle differences. :class:`InstanceNorm1d` is applied on each channel of channeled data like multidimensional time series, but :class:`LayerNorm` is usually applied on entire sample and often in NLP tasks. Additionally, :class:`LayerNorm` applies elementwise affine transform, while :class:`InstanceNorm1d` usually don't apply affine transform. Args: num_features: number of features or channels :math:`C` of the input eps: a value added to the denominator for numerical stability. Default: 1e-5 momentum: the value used for the running_mean and running_var computation. Default: 0.1 affine: a boolean value that when set to ``True``, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: ``False``. track_running_stats: a boolean value that when set to ``True``, this module tracks the running mean and variance, and when set to ``False``, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: ``False`` Shape: - Input: :math:`(N, C, L)` or :math:`(C, L)` - Output: :math:`(N, C, L)` or :math:`(C, L)` (same shape as input) Examples:: >>> # Without Learnable Parameters >>> m = nn.InstanceNorm1d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm1d(100, affine=True) >>> input = torch.randn(20, 100, 40) >>> output = m(input) c