ttend while ``False`` values will be unchanged. If a FloatTensor is provided, it will be added to the attention weight. [src/tgt/memory]_key_padding_mask provides specified elements in the key to be ignored by the attention. If a BoolTensor is provided, the positions with the value of ``True`` will be ignored while the position with the value of ``False`` will be unchanged. - output: :math:`(T, E)` for unbatched input, :math:`(T, N, E)` if `batch_first=False` or `(N, T, E)` if `batch_first=True`. Note: Due to the multi-head attention architecture in the transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decoder. where S is the source sequence length, T is the target sequence length, N is the batch size, E is the feature number Examples: >>> # xdoctest: +SKIP >>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask) é