adAttention(embed_dim, num_heads) >>> attn_output, attn_output_weights = multihead_attn(query, key, value) .. _`FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness`: https://arxiv.org/abs/2205.14135 Ú