asked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. Returns: NzJThe `use_cache` argument is changed to `False` since `labels` is provided.F)r‚