r. num_attention_heads (`int`, *optional*, defaults to 12): Number of attention heads for each attention layer in the Transformer encoder. image_size (`int`, *optional*, defaults to 224): The size (resolution) of each image. patch_size (`int`, *optional*, defaults to 32): The size (resolution) of each patch. hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` ``"gelu"` are supported. layer_norm_eps (`float`, *optional*, defaults to 1e-5): The epsilon used by the layer normalization layers. attention_dropout (`float`, *optional*, defaults to 0.0): The dropout ratio for the attention probabilities. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. Example: ```python >>> from transformers import BlipVisionConfig, BlipVisionModel >>> # Initializing a BlipVisionConfig with Salesforce/blip-vqa-base style configuration >>> configuration = BlipVisionConfig() >>> # Initializing a BlipVisionModel (with random weights) from the Salesforce/blip-vqa-base style configuration >>> model = BlipVisionModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z