`torch.FloatTensor` of shape `(batch_size, config.num_labels)` Classification (or regression if config.num_labels==1) scores (before SoftMax). **hidden_states**: (*optional*, returned when `output_hidden_states=True`) list of `torch.FloatTensor` (one for the output of each layer + the output of the embeddings) of shape `(batch_size, sequence_length, hidden_size)`: Hidden-states of the model at the output of each layer plus the initial embedding outputs. **attentions**: (*optional*, returned when `output_attentions=True`) list of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`: Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Examples: ```python # For example purposes. Not runnable. transformer = BertModel.from_pretrained("bert-base-uncased") encoder = ImageEncoder(args) model = MMBTForClassification(config, transformer, encoder) outputs = model(input_modal, input_ids, labels=labels) loss, logits = outputs[:2] ```c