`(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. question_embeds (`tf.Tensor`): The question embeddings obtained by the text projection layer. Nr2