In some instances we want to skip the basic punctuation splitting so that later tokenization can capture the full context of the words, such as contractions. TNc