[`PreTrainedTokenizer.encode`] for details. [What are input IDs?](../glossary#input-ids) attention_mask (`Numpy array` or `tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: - `1` for tokens that are **not masked**, - `0` for tokens that are **masked**. [What are attention masks?](../glossary#attention-mask) langs (`tf.Tensor` or `Numpy array` of shape `(batch_size, sequence_length)`, *optional*): A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the *language name to language id* mapping is in `model.config.lang2id` (which is a dictionary string to int) and the *language id to language name* mapping is in `model.config.id2lang` (dictionary int to string). See usage examples detailed in the [multilingual documentation](../multilingual). token_type_ids (`tf.Tensor` or `Numpy array` of shape `(batch_size, sequence_length)`, *optional*): Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: - `0` corresponds to a *sentence A* token, - `1` corresponds to a *sentence B* token. [What are token type IDs?](../glossary#token-type-ids) position_ids (`tf.Tensor` or `Numpy array` of shape `(batch_size, sequence_length)`, *optional*): Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.max_position_embeddings - 1]`. [What are position IDs?](../glossary#position-ids) lengths (`tf.Tensor` or `Numpy array` of shape `(batch_size,)`, *optional*): Length of each sentence that can be used to avoid performing attention on padding token indices. You can also use *attention_mask* for the same result (see above), kept here for compatibility Indices selected in `[0, ..., input_ids.size(-1)]`: cache (`Dict[str, tf.Tensor]`, *optional*): Dictionary string to `tf.FloatTensor` that contains precomputed hidden states (key and values in the attention blocks) as computed by the model (see `cache` output below). Can be used to speed up sequential decoding. The dictionary object will be modified in-place during the forward pass to add newly computed hidden-states. head_mask (`Numpy array` or `tf.Tensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*): Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`: - `1` indicates the head is **not masked**, - `0` indicates the head is **masked**. inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. output_attentions (`bool`, *optional*): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the config will be used instead. output_hidden_states (`bool`, *optional*): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the config will be used instead. return_dict (`bool`, *optional*): Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. This argument can be used in eager mode, in graph mode the value will always be set to True. training (`bool`, *optional*, defaults to `False`): Whether or not to use the model in training mode (some modules like dropout modules have different behaviors between training and evaluation). c