ication head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. c