_ids` docstring) Tokens with indices set to `-100` are
            ignored (masked), the loss is only computed for the tokens with labels n `[0, ..., config.vocab_size]`.
        use_cache (`bool`, *optional*):
            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
            `past_key_values`).

        Returns:

        Example:

        ```python
        >>> from transformers import AutoTokenizer, RoCBertForCausalLM, RoCBertConfig
        >>> import torch

        >>> tokenizer = AutoTokenizer.from_pretrained("weiweishi/roc-bert-base-zh")
        >>> config = RoCBertConfig.from_pretrained("weiweishi/roc-bert-base-zh")
        >>> config.is_decoder = True
        >>> model = RoCBertForCausalLM.from_pretrained("weiweishi/roc-bert-base-zh", config=config)

        >>> inputs = tokenizer("ä½ å¥½ï¼Œå¾ˆé«˜å…´è®¤è¯†ä½ ", return_tensors="pt")
        >>> outputs = model(**inputs)

        >>> prediction_logits = outputs.logits
        ```
        N)r‰