x, used to compute the weighted average in the self-attention heads. Those are the attention weights from every token with global attention to every token in the sequence. r