heads. Those are the attention weights from every token with global attention to every token in the sequence. Nr2