0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
        bypass_transformer (`bool`, *optional*, defaults to `False`):
            Whether or not the model should bypass the transformer for the visual embeddings. If set to `True`, the
            model directly concatenates the visual embeddings from [`VisualBertEmbeddings`] with text output from
            transformers, and then pass it to a self-attention layer.
        special_visual_initialize (`bool`, *optional*, defaults to `True`):
            Whether or not the visual token type and position type embedding weights should be initialized the same as
            the textual token type and positive type embeddings. When set to `True`, the weights of the textual token
            type and position type embeddings are copied to the respective visual embedding layers.


    Example:

    ```python
    >>> from transformers import VisualBertConfig, VisualBertModel

    >>> # Initializing a VisualBERT visualbert-vqa-coco-pre style configuration
    >>> configuration = VisualBertConfig.from_pretrained("uclanlp/visualbert-vqa-coco-pre")

    >>> # Initializing a model (with random weights) from the visualbert-vqa-coco-pre style configuration
    >>> model = VisualBertModel(configuration)

    >>> # Accessing the model configuration
    >>> configuration = model.config
    ```Z