aux_loss for the sparse modules. router_logits (`tuple(torch.FloatTensor)`, *optional*, returned when `output_router_logits=True` is passed or when `config.add_router_probs=True`): Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, sequence_length, num_experts)`. Router logits of the encoder model, useful to compute the auxiliary loss and the z_loss for the sparse modules. NÚ