f a sequence built with special tokens. cls_token (`str`, *optional*, defaults to `""`): The cls token, which is a special token used as the first token for all tasks. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. pad_token (`str`, *optional*, defaults to `""`): The token used for padding, for example when batching sequences of different lengths. mask_token(`str`, *optional*, defaults to `""`): The token used for masking values. This is the token used when training this model with masking tasks. This is only used in the `"base"` tokenizer type. For `"multi"` tokenizer, masking is never done for the downstream tasks. language_codes (`str`, *optional*, defaults to `"base"`): What language codes to use. Should be one of `"base"` or `"multi"`. sp_model_kwargs (`dict`, *optional*): Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set: - `enable_sampling`: Enable subword regularization. - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout. - `nbest_size = {0,1}`: No sampling is performed. - `nbest_size > 1`: samples from the nbest_size results. - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm. - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout. Examples: ```python >>> from transformers import PLBartTokenizer >>> tokenizer = PLBartTokenizer.from_pretrained("uclanlp/plbart-python-en_XX", src_lang="python", tgt_lang="en_XX") >>> example_python_phrase = "def maximum(a,b,c):NEW_LINE_INDENTreturn max([a,b,c])" >>> expected_translation_english = "Returns the maximum value of a b c." >>> inputs = tokenizer(example_python_phrase, text_target=expected_translation_english, return_tensors="pt") ```Z input_idsZattention_maskÚ prefix_tokensÚ suffix_tokensú