nal*): Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`: - 1 indicates the head is **not masked**, - 0 indicates the head is **masked**. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. num_hashes (`int`, *optional*): The number of hashing rounds that should be performed during bucketing. Setting this argument overwrites the default defined in `config.num_hashes`. For more information, see `num_hashes` in [`ReformerConfig`]. past_buckets_states (`List[Tuple(torch.LongTensor, torch.FloatTensor)]`, *optional*): List of `Tuple(torch.LongTensor, torch.FloatTensor` of length `config.n_layers`, with the first element being the previous *buckets* of shape `(batch_size, num_heads, num_hashes, sequence_length)`) and the second being the previous *hidden_states* of shape `(batch_size, sequence_length, hidden_size)`). Contains precomputed hidden-states and buckets (only relevant for LSH Self-Attention). Can be used to speed up sequential decoding. use_cache (`bool`, *optional*): If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see `past_key_values`). output_attentions (`bool`, *optional*): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more detail. output_hidden_states (`bool`, *optional*): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more detail. return_dict (`bool`, *optional*): Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. zaThe bare Reformer Model transformer outputting raw hidden-stateswithout any specific head on top.c