tied to the input embeddings, the classification head takes as input the input of a specified classification token index in the input sequence). c