probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. attention_dropout (`float`, *optional*, defaults to 0.0): The dropout ratio for the attention probabilities. initializer_range (`float`, *optional*, defaults to 1e-10): The standard deviation of the truncated_normal_initializer for initializing all weight matrices. initializer_factor (`float``, *optional*, defaults to 1): A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing). seq_len (`int`, *optional*, defaults to 4096): Maximum sequence length (here number of patches) supported by the model. relative_attention_num_buckets (`int`, *optional*, defaults to 32): The number of buckets to use for each attention layer. relative_attention_max_distance (`int`, *optional*, defaults to 128): The maximum distance (in tokens) to use for each attention layer. Example: ```python >>> from transformers import Pix2StructVisionConfig, Pix2StructVisionModel >>> # Initializing a Pix2StructVisionConfig with google/pix2struct-base style configuration >>> configuration = Pix2StructVisionConfig() >>> # Initializing a Pix2StructVisionModel (with random weights) from the google/pix2struct-base style configuration >>> model = Pix2StructVisionModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z