image_tokens (`tf.Tensor`): image tokens, of shape [batch_size, input_length, channels] group_tokens (`tf.Tensor`): group tokens, [batch_size, num_group_tokens, channels] N)