paper](see https://arxiv.org/abs/1909.11556) for more details. scale_embedding (`bool`, *optional*, defaults to `False`): Scale embeddings by diving by sqrt(d_model). use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). num_labels (`int`, *optional*, defaults to 3): The number of labels to use in [`BartForSequenceClassification`]. forced_eos_token_id (`int`, *optional*, defaults to 2): The id of the token to force as the last generated token when `max_length` is reached. Usually set to `eos_token_id`. Example: ```python >>> from transformers import BartConfig, BartModel >>> # Initializing a BART facebook/bart-large style configuration >>> configuration = BartConfig() >>> # Initializing a model (with random weights) from the facebook/bart-large style configuration >>> model = BartModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z