unk_token (`str`, *optional*, defaults to `<|endoftext|>`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. bos_token (`str`, *optional*, defaults to `<|endoftext|>`): The beginning of sequence token. eos_token (`str`, *optional*, defaults to `<|endoftext|>`): The end of sequence token. add_prefix_space (`bool`, *optional*, defaults to `False`): Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. (CodeGen tokenizer detect beginning of words by the preceding space). Z input_idsZattention_maskÚ