inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the documentation from [`PretrainedConfig`] for more information. Args: act_fn (`str`, *optional*, defaults to `"relu"`): Activation function of the model. nb_discrete_codes (`int`, *optional*, defaults to 2048): Number of codes of the VQVAE. commit (`float`, *optional*, defaults to 0.02): Commit loss multiplier. conv_input_shape (`int`, *optional*, defaults to 1): Number of audio channels. conv_res_scale (`bool`, *optional*, defaults to `False`): Whether or not to scale the residuals of the `JukeboxResConv1DBlock`. embed_dim (`int`, *optional*, defaults to 64): Embedding dimension of the codebook vectors. hop_fraction (`List[int]`, *optional*, defaults to `[0.125, 0.5, 0.5]`): Fraction of non-intersecting window used when continuing the sampling process. levels (`int`, *optional*, defaults to 3): Number of hierarchical levels that used in the VQVAE. lmu (`float`, *optional*, defaults to 0.99): Used in the codebook update, exponential moving average coefficient. For more detail refer to Appendix A.1 of the original [VQVAE paper](https://arxiv.org/pdf/1711.00937v2.pdf) multipliers (`List[int]`, *optional*, defaults to `[2, 1, 1]`): Depth and width multipliers used for each level. Used on the `res_conv_width` and `res_conv_depth` res_conv_depth (`int`, *optional*, defaults to 4): Depth of the encoder and decoder block. If no `multipliers` are used, this is the same for each level. res_conv_width (`int`, *optional*, defaults to 32): Width of the encoder and decoder block. If no `multipliers` are used, this is the same for each level. res_convolution_multiplier (`int`, *optional*, defaults to 1): Scaling factor of the hidden dimension used in the `JukeboxResConv1DBlock`. res_dilation_cycle (`int`, *optional*): Dilation cycle value used in the `JukeboxResnet`. If an int is used, each new Conv1 block will have a depth reduced by a power of `res_dilation_cycle`. res_dilation_growth_rate (`int`, *optional*, defaults to 3): Resnet dilation growth rate used in the VQVAE (dilation_growth_rate ** depth) res_downs_t (`List[int]`, *optional*, defaults to `[3, 2, 2]`): Downsampling rate for each level of the hierarchical VQ-VAE. res_strides_t (`List[int]`, *optional*, defaults to `[2, 2, 2]`): Stride used for each level of the hierarchical VQ-VAE. sample_length (`int`, *optional*, defaults to 1058304): Provides the max input shape of the VQVAE. Is used to compute the input shape of each level. init_scale (`float`, *optional*, defaults to 0.2): Initialization scale. zero_out (`bool`, *optional*, defaults to `False`): Whether or not to zero out convolution weights when initializing. Z jukebox_vqvaeZ