atch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size `(encoder_attention_heads,)` )