SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set: - `enable_sampling`: Enable subword regularization. - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout. - `nbest_size = {0,1}`: No sampling is performed. - `nbest_size > 1`: samples from the nbest_size results. - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm. - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout. Examples: ```python >>> from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer >>> model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M") >>> tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M", src_lang="en", tgt_lang="ro") >>> src_text = " UN Chief Says There Is No Military Solution in Syria" >>> tgt_text = "Åžeful ONU declară că nu există o soluÅ£ie militară în Siria" >>> model_inputs = tokenizer(src_text, text_target=tgt_text, return_tensors="pt") >>> outputs = model(**model_inputs) # should work ```Z input_idsZattention_maskÚ prefix_tokensÚ suffix_tokensNú