masked), the loss is only computed for labels in `[0, ..., config.vocab_size - 1]`. Nrk