ration. filter_value (`float`, *optional*, defaults to `-float("Inf")`): All filtered values will be set to this float value. min_tokens_to_keep (`int`, *optional*, defaults to 1): Minimum number of tokens that cannot be filtered. Examples: ```python >>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed >>> set_seed(0) >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2") >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2") >>> inputs = tokenizer("A sequence: 1, 2", return_tensors="pt") >>> # With sampling, the output is unexpected -- sometimes too unexpected. >>> outputs = model.generate(**inputs, do_sample=True) >>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]) A sequence: 1, 2, 0, 2, 2. 2, 2, 2, 2 >>> # With `top_p` sampling, the output gets restricted to high-probability tokens. >>> # Pro tip: In practice, LLMs use `top_p` in the 0.9-0.95 range. >>> outputs = model.generate(**inputs, do_sample=True, top_p=0.1) >>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]) A sequence: 1, 2, 3, 4, 5, 6, 7, 8, 9 ``` Ú