th a span classification head on top for extractive question-answering tasks such as [DocVQA](https://rrc.cvc.uab.es/?ch=17) (a linear layer on top of the final hidden-states output to compute `span start logits` and `span end logits`). c