hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)* attention_mask (`tf.Tensor`): attention mask of size *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size *(encoder_attention_heads,)* )