encoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. decoder_layerdrop (`float`, *optional*, defaults to 0.0): The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details. use_cache (`bool`, *optional*, defaults to `True`): Whether or not the model should return the last key/values attentions (not used by all models). attention_type (`str`, *optional*, defaults to `"block_sparse"`) Whether to use block sparse attention (with n complexity) as introduced in paper or original attention layer (with n^2 complexity) in encoder. Possible values are `"original_full"` and `"block_sparse"`. use_bias (`bool`, *optional*, defaults to `False`) Whether to use bias in query, key, value. block_size (`int`, *optional*, defaults to 64) Size of each block. Useful only when `attention_type == "block_sparse"`. num_random_blocks (`int`, *optional*, defaults to 3) Each query is going to attend these many number of random blocks. Useful only when `attention_type == "block_sparse"`. scale_embeddings (`bool`, *optional*, defaults to `True`) Whether to rescale embeddings with (hidden_size ** 0.5). Example: ```python >>> from transformers import BigBirdPegasusConfig, BigBirdPegasusModel >>> # Initializing a BigBirdPegasus bigbird-pegasus-base style configuration >>> configuration = BigBirdPegasusConfig() >>> # Initializing a model (with random weights) from the bigbird-pegasus-base style configuration >>> model = BigBirdPegasusModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Zbigbird_pegasusÚpast_key_valuesÚ