Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Has to be one of the following options: - `"last"`: Take the last token hidden state (like XLNet). - `"first"`: Take the first token hidden state (like BERT). - `"mean"`: Take the mean of all tokens hidden states. - `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2). - `"attn"`: Not implemented now, use multi-head attention. summary_use_proj (`bool`, *optional*, defaults to `True`): Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Whether or not to add a projection after the vector extraction. summary_activation (`str`, *optional*): Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Pass `"gelu"` for a gelu activation to the output, any other value will result in no activation. summary_last_dropout (`float`, *optional*, defaults to 0.0): Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. The dropout ratio to be used after the projection and activation. position_embedding_type (`str`, *optional*, defaults to `"absolute"`): Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155). For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658). use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if `config.is_decoder=True`. classifier_dropout (`float`, *optional*): The dropout ratio for the classification head. Examples: ```python >>> from transformers import ElectraConfig, ElectraModel >>> # Initializing a ELECTRA electra-base-uncased style configuration >>> configuration = ElectraConfig() >>> # Initializing a model (with random weights) from the electra-base-uncased style configuration >>> model = ElectraModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z