trings (pretokenized), you must set `is_split_into_words=True` (to lift the ambiguity with a batch of sequences). voice_preset (`str`, `Dict[np.ndarray]`): The voice preset, i.e the speaker embeddings. It can either be a valid voice_preset name, e.g `"en_speaker_1"`, or directly a dictionnary of `np.ndarray` embeddings for each submodel of `Bark`. Or it can be a valid file name of a local `.npz` single voice preset. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Acceptable values are: - `'pt'`: Return PyTorch `torch.Tensor` objects. - `'np'`: Return NumPy `np.ndarray` objects. Returns: Tuple([`BatchEncoding`], [`BatchFeature`]): A tuple composed of a [`BatchEncoding`], i.e the output of the `tokenizer` and a [`BatchFeature`], i.e the voice preset with the right tensors type. Nz