elative_key_query"`. For positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155). For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658). is_decoder (`bool`, *optional*, defaults to `False`): Whether the model is used as a decoder or not. If `False`, the model is used as an encoder. use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if `config.is_decoder=True`. classifier_dropout (`float`, *optional*): The dropout ratio for the classification head. Examples: ```python >>> from transformers import XLMRobertaConfig, XLMRobertaModel >>> # Initializing a XLM-RoBERTa xlm-roberta-base style configuration >>> configuration = XLMRobertaConfig() >>> # Initializing a model (with random weights) from the xlm-roberta-base style configuration >>> model = XLMRobertaModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```z