)`. Takes the largest min_tokens_to_keep tokens if no tokens satisfy this constraint. It addresses the issue of poor quality in long samples of text generated by neural language models leading to more coherent and fluent text. See [Truncation Sampling as Language Model Desmoothing](https://arxiv.org/abs/2210.15191) for more information. Note: `do_sample` must be set to `True` for this `LogitsWarper` to work. Args: epsilon (`float`): A float value in the range (0, 1). Hyperparameter used to calculate the dynamic cutoff value, `eta`. The suggested values from the paper ranges from 3e-4 to 4e-3 depending on the size of the model. filter_value (`float`, *optional*, defaults to `-float("Inf")`): All values that are found to be below the dynamic cutoff value, `eta`, are set to this float value. This parameter is useful when logits need to be modified for very low probability tokens that should be excluded from generation entirely. min_tokens_to_keep (`int`, *optional*, defaults to 1): Specifies the minimum number of tokens that must be kept for generation, regardless of their probabilities. For example, if `min_tokens_to_keep` is set to 1, at least one token will always be kept for generation, even if all tokens have probabilities below the cutoff `eta`. Examples: ```python >>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed >>> set_seed(0) >>> model = AutoModelForCausalLM.from_pretrained("distilgpt2") >>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2") >>> inputs = tokenizer("A sequence: 1, 2", return_tensors="pt") >>> # With sampling, the output is unexpected -- sometimes too unexpected. >>> outputs = model.generate(**inputs, do_sample=True) >>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]) A sequence: 1, 2, 0, 2, 2. 2, 2, 2, 2 >>> # With eta sampling, the output gets restricted to high-probability tokens. You can see it as a dynamic form of >>> # epsilon sampling that adapts its cutoff probability based on the entropy (high entropy = lower cutoff). >>> # Pro tip: The paper recomends using `eta_cutoff` values between 3e-4 to 4e-3 >>> outputs = model.generate(**inputs, do_sample=True, eta_cutoff=0.1) >>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]) A sequence: 1, 2, 3, 4, 5, 6, 7, 8, 9 ``` rj