yer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs. generator_enc_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the self-attention heads. generator_dec_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): Tuple of `torch.FloatTensor` (one for the output of the embeddings and one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs. generator_dec_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the self-attention heads. generator_cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Cross-attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the cross-attention heads. Nr