isperFeatureExtractor.__call__`] attention_mask (`numpy.ndarray` of shape `(batch_size, sequence_length)`, *optional*): Whisper does not support masking of the `input_features`, this argument is preserved for compatibility, but is not used. By default the silence in the input log mel spectrogram are ignored. decoder_input_ids (`numpy.ndarray` of shape `(batch_size, target_sequence_length)`, *optional*): Indices of decoder input sequence tokens in the vocabulary. Indices can be obtained using [`WhisperTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are decoder input IDs?](../glossary#decoder-input-ids) Whisper uses the `decoder_start_token_id` as the starting token for `decoder_input_ids` generation. decoder_attention_mask (`numpy.ndarray` of shape `(batch_size, target_sequence_length)`, *optional*): Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also be used by default. If you want to change padding behavior, you should modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more information on the default strategy. position_ids (`numpy.ndarray` of shape `(batch_size, sequence_length)`, *optional*): Whisper does not use `position_ids` in the encoder as `input_features` is always the same size and doesn't use masking, but this argument is preserved for compatibility. By default the silence in the input log mel spectrogram are ignored. decoder_position_ids (`numpy.ndarray` of shape `(batch_size, sequence_length)`, *optional*): Indices of positions of each decoder input sequence tokens in the position embeddings. Selected in the range `[0, config.max_position_embeddings - 1]`. output_attentions (`bool`, *optional*): Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned tensors for more detail. output_hidden_states (`bool`, *optional*): Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for more detail. return_dict (`bool`, *optional*): Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. aĠ