outputs = model(**inputs) >>> logits_per_video = outputs.logits_per_video # this is the video-text similarity score >>> probs = logits_per_video.softmax(dim=1) # we can take the softmax to get the label probabilities >>> print(probs) tensor([[1.9496e-04, 9.9960e-01, 2.0825e-04]]) ```NrP