shape `(batch_size, num_aggregation_labels)`): Logits per aggregation operation. aggregate_mask (`tf.Tensor` of shape `(batch_size, )`): A mask set to 1 for examples that should use aggregation functions Returns: aggregation_loss_unknown (`tf.Tensor` of shape `(batch_size,)`): Aggregation loss (in case of answer supervision) per example. ru