ol`, *optional*): Whether or not to return the attentions tensors of all attention layers. past_key_values (`Tuple[torch.Tensor, torch.Tensor]`, *optional*): Cached past key and value projection states. use_cache (`bool`, *optional*): If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). r