ptional*, defaults to `[1000, 1000, 1000, 1000]`): The bounding box to use for the special [SEP] token. pad_token_box (`List[int]`, *optional*, defaults to `[0, 0, 0, 0]`): The bounding box to use for the special [PAD] token. pad_token_label (`int`, *optional*, defaults to -100): The label to use for padding tokens. Defaults to -100, which is the `ignore_index` of PyTorch's CrossEntropyLoss. only_label_first_subword (`bool`, *optional*, defaults to `True`): Whether or not to only label the first subword, in case word labels are provided. additional_special_tokens (`List[str]`, *optional*, defaults to `["NOTUSED", "NOTUSED"]`): Additional special tokens used by the tokenizer. sp_model_kwargs (`dict`, *optional*): Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set: - `enable_sampling`: Enable subword regularization. - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout. - `nbest_size = {0,1}`: No sampling is performed. - `nbest_size > 1`: samples from the nbest_size results. - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm. - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout. Attributes: sp_model (`SentencePieceProcessor`): The *SentencePiece* processor that is used for every conversion (string, tokens and IDs). Ú input_idsÚattention_maskú