The number of images to use for natural language visual reasoning. If set to a positive integer, will be used by [`ViltForImagesAndTextClassification`] for defining the classifier head. Example: ```python >>> from transformers import ViLTModel, ViLTConfig >>> # Initializing a ViLT dandelin/vilt-b32-mlm style configuration >>> configuration = ViLTConfig() >>> # Initializing a model from the dandelin/vilt-b32-mlm style configuration >>> model = ViLTModel(configuration) >>> # Accessing the model configuration >>> configuration = model.config ```Z