router_logits (`torch.Tensor` of shape (batch_size, sequence_length))): Logits tensor of shape (batch_size, sequence_length, num_experts) corresponding to raw router logits. This is used later for computing router z-loss. N)r$