ion necessary to initiate the vision model. Can be either: - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`. - A path to a *directory* containing model weights saved using [`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`. - A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt` should be set to `True` and a configuration object should be provided as `config` argument. text_model_name_or_path (`str`, *optional*): Information necessary to initiate the text model. Can be either: - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`. - A path to a *directory* containing model weights saved using [`~TFPreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`. - A path or url to a *PyTorch checkpoint folder* (e.g, `./pt_model`). In this case, `from_pt` should be set to `True` and a configuration object should be provided as `config` argument. model_args (remaining positional arguments, *optional*): All remaning positional arguments will be passed to the underlying model's `__init__` method. kwargs (remaining dictionary of keyword arguments, *optional*): Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., `output_attentions=True`). - To update the text configuration, use the prefix *text_* for each configuration parameter. - To update the vision configuration, use the prefix *vision_* for each configuration parameter. - To update the parent model configuration, do not use a prefix for each configuration parameter. Behaves differently depending on whether a `config` is provided or automatically loaded. Example: ```python >>> from transformers import TFVisionTextDualEncoderModel >>> # initialize a model from pretrained ViT and BERT models. Note that the projection layers will be randomly initialized. >>> model = TFVisionTextDualEncoderModel.from_vision_text_pretrained( ... "google/vit-base-patch16-224", "bert-base-uncased" ... ) >>> # saving model after fine-tuning >>> model.save_pretrained("./vit-bert") >>> # load fine-tuned model >>> model = TFVisionTextDualEncoderModel.from_pretrained("./vit-bert") ```c