ional*): Whether or not to scale the residuals in the conditioner block. Since the top level prior does not have a conditioner, the default value is to None and should not be modified. num_layers (`int`, *optional*, defaults to 72): Number of layers of the transformer architecture. emb_dropout (`int`, *optional*, defaults to 0): Embedding dropout used in the lyric decoder. encoder_config (`JukeboxPriorConfig`, *optional*) : Configuration of the encoder which models the prior on the lyrics. encoder_loss_fraction (`float`, *optional*, defaults to 0.4): Multiplication factor used in front of the lyric encoder loss. hidden_size (`int`, *optional*, defaults to 2048): Hidden dimension of the attention layers. init_scale (`float`, *optional*, defaults to 0.2): Initialization scales for the prior modules. is_encoder_decoder (`bool`, *optional*, defaults to `True`): Whether or not the prior is an encoder-decoder model. In case it is not, and `nb_relevant_lyric_tokens` is greater than 0, the `encoder` args should be specified for the lyric encoding. mask (`bool`, *optional*, defaults to `False`): Whether or not to mask the previous positions in the attention. max_duration (`int`, *optional*, defaults to 600): Maximum supported duration of the generated song in seconds. max_nb_genres (`int`, *optional*, defaults to 1): Maximum number of genres that can be used to condition the model. merged_decoder (`bool`, *optional*, defaults to `True`): Whether or not the decoder and the encoder inputs are merged. This is used for the separated encoder-decoder architecture metadata_conditioning (`bool`, *optional*, defaults to `True)`: Whether or not to condition on the artist and genre metadata. metadata_dims (`List[int]`, *optional*, defaults to `[604, 7898]`): Number of genres and the number of artists that were used to train the embedding layers of the prior models. min_duration (`int`, *optional*, defaults to 0): Minimum duration of the generated audio on which the model was trained. mlp_multiplier (`float`, *optional*, defaults to 1.0): Multiplier coefficient used to define the hidden dimension of the MLP layers. 0.25 means that 0.25*width of the model will be used. music_vocab_size (`int`, *optional*, defaults to 2048): Number of different music tokens. Should be similar to the `JukeboxVQVAEConfig.nb_discrete_codes`. n_ctx (`int`, *optional*, defaults to 6144): Number of context tokens for each prior. The context tokens are the music tokens that are attended to when generating music tokens. n_heads (`int`, *optional*, defaults to 2): Number of attention heads. nb_relevant_lyric_tokens (`int`, *optional*, defaults to 384): Number of lyric tokens that are used when sampling a single window of length `n_ctx` res_conv_depth (`int`, *optional*, defaults to 3): Depth of the `JukeboxDecoderConvBock` used to upsample the previously sampled audio in the `JukeboxMusicTokenConditioner`. res_conv_width (`int`, *optional*, defaults to 128): Width of the `JukeboxDecoderConvBock` used to upsample the previously sampled audio in the `JukeboxMusicTokenConditioner`. res_convolution_multiplier (`int`, *optional*, defaults to 1): Multiplier used to scale the `hidden_dim` of the `JukeboxResConv1DBlock`. res_dilation_cycle (`int`, *optional*): Dilation cycle used to define the `JukeboxMusicTokenConditioner`. Usually similar to the ones used in the corresponding level of the VQVAE. The first prior does not use it as it is not conditioned on upper level tokens. res_dilation_growth_rate (`int`, *optional*, defaults to 1): Dilation grow rate used between each convolutionnal block of the `JukeboxMusicTokenConditioner` res_downs_t (`List[int]`, *optional*, defaults to `[3, 2, 2]`): Downsampling rates used in the audio conditioning network res_strides_t (`List[int]`, *optional*, defaults to `[2, 2, 2]`): Striding used in the audio conditioning network resid_dropout (`int`, *optional*, defaults to 0): Residual dropout used in the attention pattern. sampling_rate (`int`, *optional*, defaults to 44100): Sampling rate used for training. spread (`int`, *optional*): Spread used in the `summary_spread_attention` pattern timing_dims (`int`, *optional*, defaults to 64): Dimension of the timing embedding. zero_out (`bool`, *optional*, defaults to `False`): Whether or not to zero out convolution weights when initializing. Z jukebox_priorZ