ed**, - 0 for tokens that are **masked**. [What are attention masks?](../glossary#attention-mask) encoder_outputs (`tuple(tuple(torch.FloatTensor)`, *optional*) Tuple consists of (`generator_enc_last_hidden_state`, *optional*: `generator_enc_hidden_states`, *optional*: `generator_enc_attentions`). `generator_enc_last_hidden_state` of shape `(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of the last layer of the generator's encoder. Used by the ([`RagModel`]) model during decoding. decoder_input_ids (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*): Provide for generation tasks. `None` by default, construct as per instructions for the generator model you're using with your RAG instance. decoder_attention_mask (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*): Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also be used by default. past_key_values (`tuple(tuple(torch.FloatTensor))`): Tuple consists of two elements: `encoder_outputs` of the RAG model (see `encoder_outputs`) and `past_key_values` of the underlying generator. Can be used to speed up decoding. `past_key_values` are used in the ([`RagTokenForGeneration`]) model during decoding. doc_scores (`torch.FloatTensor` of shape `(batch_size, config.n_docs)`): Score between each retrieved document embeddings (see `retrieved_doc_embeds`) and `question_encoder_last_hidden_state`. If the model has is not initialized with a `retriever` `doc_scores` has to be provided to the forward pass. `doc_scores` can be computed via `question_encoder_last_hidden_state` and `retrieved_doc_embeds`, see examples for more information. context_input_ids (`torch.LongTensor` of shape `(batch_size * config.n_docs, config.max_combined_length)`, *optional*, returned when *output_retrieved=True*): Input IDs post-processed from the retrieved documents and the question encoder `input_ids` by the retriever. If the model was not initialized with a `retriever` ``context_input_ids` has to be provided to the forward pass. `context_input_ids` are returned by [`~RagRetriever.__call__`]. context_attention_mask (`torch.LongTensor` of shape `(batch_size * config.n_docs, config.max_combined_length)`,*optional*, returned when *output_retrieved=True*): Attention mask post-processed from the retrieved documents and the question encoder `input_ids` by the retriever. If the model has is not initialized with a `retriever` `context_attention_mask` has to be provided to the forward pass. `context_attention_mask` are returned by [`~RagRetriever.__call__`]. use_cache (`bool`, *optional*, defaults to `True`): If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). output_attentions (`bool`, *optional*): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more detail. output_hidden_states (`bool`, *optional*): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more detail. output_retrieved(`bool`, *optional*): Whether or not to return the `retrieved_doc_embeds`, `retrieved_doc_ids`, `context_input_ids` and `context_attention_mask`. See returned tensors for more detail. n_docs (`int`, *optional*, defaults to `config.n_docs``) Number of documents to retrieve and/or number of documents for which to generate an answer. c