seq_len, batch, embed_dim)* encoder_padding_mask (`torch.ByteTensor`): binary ByteTensor of shape *(batch, src_len)* where padding elements are indicated by `1`. for t_tgt, t_src is excluded (or masked out), =0 means it is included in attention layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size *(config.encoder_attention_heads,)*. Returns: encoded output of shape *(seq_len, batch, embed_dim)* )