r of the mask token to have it eat the space before it. This is needed to preserve backward compatibility with all the previously used models based on Roberta. TF)