rch.FloatTensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`): Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. use_cache (`bool`, *optional*): If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). Returns: Examples: ```python >>> from transformers import AutoProcessor, AutoModel >>> import requests >>> from PIL import Image >>> processor = AutoProcessor.from_pretrained("microsoft/git-base") >>> model = AutoModel.from_pretrained("microsoft/git-base") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> text = "this is an image of two cats" >>> inputs = processor(text, images=image, return_tensors="pt") >>> outputs = model(**inputs) >>> last_hidden_state = outputs.last_hidden_state ```NzDYou cannot specify both input_ids and inputs_embeds at the same timer=