LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. scale_embedding (`bool`, *optional*, defaults to `False`): Scale embeddings by diving by sqrt(d_model). use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models) forced_eos_token_id (`int`, *optional*, defaults to 2): The id of the token to force as the last generated token when `max_length` is reached. Usually set to `eos_token_id`. Example: ```python >>> from transformers import BlenderbotSmallConfig, BlenderbotSmallModel >>> # Initializing a BlenderbotSmall facebook/blenderbot_small-90M style configuration >>> configuration = BlenderbotSmallConfig() >>> # Initializing a model (with random weights) from the facebook/blenderbot_small-90M style configuration >>> model = BlenderbotSmallModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```z