nds read_video_stream (int, optional): whether read video stream. If yes, set to 1. Otherwise, 0 video_width/video_height/video_min_dimension/video_max_dimension (int): together decide the size of decoded frames: - When video_width = 0, video_height = 0, video_min_dimension = 0, and video_max_dimension = 0, keep the original frame resolution - When video_width = 0, video_height = 0, video_min_dimension != 0, and video_max_dimension = 0, keep the aspect ratio and resize the frame so that shorter edge size is video_min_dimension - When video_width = 0, video_height = 0, video_min_dimension = 0, and video_max_dimension != 0, keep the aspect ratio and resize the frame so that longer edge size is video_max_dimension - When video_width = 0, video_height = 0, video_min_dimension != 0, and video_max_dimension != 0, resize the frame so that shorter edge size is video_min_dimension, and longer edge size is video_max_dimension. The aspect ratio may not be preserved - When video_width = 0, video_height != 0, video_min_dimension = 0, and video_max_dimension = 0, keep the aspect ratio and resize the frame so that frame video_height is $video_height - When video_width != 0, video_height == 0, video_min_dimension = 0, and video_max_dimension = 0, keep the aspect ratio and resize the frame so that frame video_width is $video_width - When video_width != 0, video_height != 0, video_min_dimension = 0, and video_max_dimension = 0, resize the frame so that frame video_width and video_height are set to $video_width and $video_height, respectively video_pts_range (list(int), optional): the start and end presentation timestamp of video stream video_timebase (Fraction, optional): a Fraction rational number which denotes timebase in video stream read_audio_stream (int, optional): whether read audio stream. If yes, set to 1. Otherwise, 0 audio_samples (int, optional): audio sampling rate audio_channels (int optional): audio channels audio_pts_range (list(int), optional): the start and end presentation timestamp of audio stream audio_timebase (Fraction, optional): a Fraction rational number which denotes time base in audio stream Returns vframes (Tensor[T, H, W, C]): the `T` video frames aframes (Tensor[L, K]): the audio frames, where `L` is the number of points and `K` is the number of audio_channels info (Dict): metadata for the video and audio. Can contain the fields video_fps (float) and audio_fps (int) r