lity of the encoder layers and the pooler layer. num_hidden_layers (`int`, *optional*, defaults to 12): Number of hidden layers in the Transformer encoder. num_attention_heads (`int`, *optional*, defaults to 12): Number of attention heads for each attention layer in the Transformer encoder. intermediate_size (`int`, *optional*, defaults to 3072): Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported. hidden_dropout_prob (`float`, *optional*, defaults to 0.1): The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): The dropout ratio for the attention probabilities. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. layer_norm_eps (`float`, *optional*, defaults to 1e-12): The epsilon used by the layer normalization layers. qkv_bias (`bool`, *optional*, defaults to `True`): Whether to add a bias to the queries, keys and values. use_cls_token (`bool`, *optional*, defaults to `True`): Whether to use an extra CLS token for multimodal settings. Usually needed by the FLAVA model. Example: ```python >>> from transformers import FlavaMultimodalConfig, FlavaMultimodalModel >>> # Initializing a FlavaMultimodalModel with style configuration >>> configuration = FlavaMultimodalConfig() >>> # Initializing a FlavaMultimodalModel model (with random weights) from the style configuration >>> model = FlavaMultimodalModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z