ape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads. loc (`torch.FloatTensor` of shape `(batch_size,)` or `(batch_size, input_size)`, *optional*): Shift values of each time series' context window which is used to give the model inputs of the same magnitude and then used to shift back to the original magnitude. scale (`torch.FloatTensor` of shape `(batch_size,)` or `(batch_size, input_size)`, *optional*): Scaling values of each time series' context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude. static_features (`torch.FloatTensor` of shape `(batch_size, feature size)`, *optional*): Static features of each time series' in a batch which are copied to the covariates at inference time. Nr