tention layers. num_pos_feats (`int`, *optional*, defaults to 128): The dimensionality of the position embedding. mlp_dim (`int`, *optional*, defaults to None): The dimensionality of the MLP layer in the Transformer encoder. If `None`, defaults to `mlp_ratio * hidden_size`. i