Has to be one of the following options: - `"last"`: Take the last token hidden state (like XLNet). - `"first"`: Take the first token hidden state (like BERT). - `"mean"`: Take the mean of all tokens hidden states. - `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2). - `"attn"`: Not implemented now, use multi-head attention. summary_use_proj (`bool`, *optional*, defaults to `True`): Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Whether or not to add a projection after the vector extraction. summary_activation (`str`, *optional*): Argument used when doing sequence summary. Used in the sequence classification and multiple choice models. Pass `"tanh"` for a tanh activation to the output, any other value will result in no activation. summary_proj_to_labels (`boo`, *optional*, defaults to `True`): Used in the sequence classification and multiple choice models. Whether the projection outputs should have `config.num_labels` or `config.hidden_size` classes. summary_last_dropout (`float`, *optional*, defaults to 0.1): Used in the sequence classification and multiple choice models. The dropout ratio to be used after the projection and activation. start_n_top (`int`, *optional*, defaults to 5): Used in the SQuAD evaluation script. end_n_top (`int`, *optional*, defaults to 5): Used in the SQuAD evaluation script. use_mems_eval (`bool`, *optional*, defaults to `True`): Whether or not the model should make use of the recurrent memory mechanism in evaluation mode. use_mems_train (`bool`, *optional*, defaults to `False`): Whether or not the model should make use of the recurrent memory mechanism in train mode. For pretraining, it is recommended to set `use_mems_train` to `True`. For fine-tuning, it is recommended to set `use_mems_train` to `False` as discussed [here](https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587). If `use_mems_train` is set to `True`, one has to make sure that the train batches are correctly pre-processed, *e.g.* `batch_1 = [[This line is], [This is the]]` and `batch_2 = [[ the first line], [ second line]]` and that all batches are of equal size. Examples: ```python >>> from transformers import XLNetConfig, XLNetModel >>> # Initializing a XLNet configuration >>> configuration = XLNetConfig() >>> # Initializing a model (with random weights) from the configuration >>> model = XLNetModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z