if `num_static_categorical_features` is > 0. d_model (`int`, *optional*, defaults to 64): Dimensionality of the transformer layers. encoder_layers (`int`, *optional*, defaults to 2): Number of encoder layers. decoder_layers (`int`, *optional*, defaults to 2): Number of decoder layers. encoder_attention_heads (`int`, *optional*, defaults to 2): Number of attention heads for each attention layer in the Transformer encoder. decoder_attention_heads (`int`, *optional*, defaults to 2): Number of attention heads for each attention layer in the Transformer decoder. encoder_ffn_dim (`int`, *optional*, defaults to 32): Dimension of the "intermediate" (often named feed-forward) layer in encoder. decoder_ffn_dim (`int`, *optional*, defaults to 32): Dimension of the "intermediate" (often named feed-forward) layer in decoder. activation_function (`str` or `function`, *optional*, defaults to `"gelu"`): The non-linear activation function (function or string) in the encoder and decoder. If string, `"gelu"` and `"relu"` are supported. dropout (`float`, *optional*, defaults to 0.1): The dropout probability for all fully connected layers in the encoder, and decoder. encoder_layerdrop (`float`, *optional*, defaults to 0.1): The dropout probability for the attention and fully connected layers for each encoder layer. decoder_layerdrop (`float`, *optional*, defaults to 0.1): The dropout probability for the attention and fully connected layers for each decoder layer. attention_dropout (`float`, *optional*, defaults to 0.1): The dropout probability for the attention probabilities. activation_dropout (`float`, *optional*, defaults to 0.1): The dropout probability used between the two layers of the feed-forward networks. num_parallel_samples (`int`, *optional*, defaults to 100): The number of samples to generate in parallel for each time step of inference. init_std (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated normal weight initialization distribution. use_cache (`bool`, *optional*, defaults to `True`): Whether to use the past key/values attentions (if applicable to the model) to speed up decoding. attention_type (`str`, *optional*, defaults to "prob"): Attention used in encoder. This can be set to "prob" (Informer's ProbAttention) or "full" (vanilla transformer's canonical self-attention). sampling_factor (`int`, *optional*, defaults to 5): ProbSparse sampling factor (only makes affect when `attention_type`="prob"). It is used to control the reduced query matrix (Q_reduce) input length. distil (`bool`, *optional*, defaults to `True`): Whether to use distilling in encoder. Example: ```python >>> from transformers import InformerConfig, InformerModel >>> # Initializing an Informer configuration with 12 time steps for prediction >>> configuration = InformerConfig(prediction_length=12) >>> # Randomly initializing a model (with random weights) from the configuration >>> model = InformerModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```ZinformerÚ