. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. scale_embedding (`bool`, *optional*, defaults to `False`): Scale embeddings by diving by sqrt(d_model). use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models) forced_eos_token_id (`int`, *optional*, defaults to 2): The id of the token to force as the last generated token when `max_length` is reached. Usually set to `eos_token_id`. Example: ```python >>> from transformers import MBartConfig, MBartModel >>> # Initializing a MBART facebook/mbart-large-cc25 style configuration >>> configuration = MBartConfig() >>> # Initializing a model (with random weights) from the facebook/mbart-large-cc25 style configuration >>> model = MBartModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z