ults to 0.1): The dropout ratio for the attention. layer_norm_epsilon (`float`, *optional*, defaults to 1e-5): The epsilon to use in the layer normalization layers. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). Example: ```python >>> from transformers import CodeGenConfig, CodeGenModel >>> # Initializing a CodeGen 6B configuration >>> configuration = CodeGenConfig() >>> # Initializing a model (with random weights) from the configuration >>> model = CodeGenModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z