The standard deviation of the truncated_normal_initializer for initializing all weight matrices. use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if `config.is_decoder=True`. bos_token_id (`int`, *optional*, defaults to 50256): The id of the beginning of sentence token in the vocabulary. eos_token_id (`int`, *optional*, defaults to 50256): The id of the end of sentence token in the vocabulary. Example: ```python >>> from transformers import GPTNeoConfig, GPTNeoModel >>> # Initializing a GPTNeo EleutherAI/gpt-neo-1.3B style configuration >>> configuration = GPTNeoConfig() >>> # Initializing a model (with random weights) from the EleutherAI/gpt-neo-1.3B style configuration >>> model = GPTNeoModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z