tor (`float`, *optional*, defaults to 0.002): A factor for initializing all weight matrices. output_router_logits (`bool`, *optional*, default to `False`): Whether or not to return the router logits of all experts. use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models) zgptsan-japaneseZpast_key_valuesÚ