d), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]` NŠ