tio for all dropout layers. layer_norm_eps (`float`, *optional*, defaults to 1e-6): The epsilon used by the layer normalization layers. router_z_loss_coef (`float`, *optional*, defaults to 0.001): The z loss factor for the total loss. router_aux_loss_coef (`float`, *optional*, defaults to 0.001): The aux loss factor for the total loss. initializer_factor (`float`, *optional*, defaults to 1): A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing). feed_forward_proj (`string`, *optional*, defaults to `"relu"`): Type of feed forward layer to be used. Should be one of `"relu"` or `"gated-gelu"`. SwitchTransformersv1.1 uses the `"gated-gelu"` feed forward projection. Original SwitchTransformers uses `"relu"`. add_router_probs (`bool`, *optional*, defaults to `False`): Whether to output router probabilities to compute router auxiliary loss. use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). Z