ds, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Returned when `output_attentions=True`. masks_queries_logits (`tuple(torch.FloatTensor)` of shape `(batch_size, num_queries, height, width)`): Tuple of mask predictions from all layers of the transformer decoder. intermediate_hidden_states (`tuple(torch.FloatTensor)` of shape `(num_queries, 1, hidden_size)`): Intermediate decoder activations, i.e. the output of each decoder layer, each of them gone through a layernorm. NÚ