onal*, defaults to 1e-12): The epsilon used by the layer normalization layers. classifier_dropout_prob (`float`, *optional*, defaults to 0.1): The dropout ratio for attached classifiers. position_embedding_type (`str`, *optional*, defaults to `"absolute"`): Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155). For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658). Examples: ```python >>> from transformers import AlbertConfig, AlbertModel >>> # Initializing an ALBERT-xxlarge style configuration >>> albert_xxlarge_configuration = AlbertConfig() >>> # Initializing an ALBERT-base style configuration >>> albert_base_configuration = AlbertConfig( ... hidden_size=768, ... num_attention_heads=12, ... intermediate_size=3072, ... ) >>> # Initializing a model (with random weights) from the ALBERT-base style configuration >>> model = AlbertModel(albert_xxlarge_configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z