d cross attention. Split present state from grouped by layer to grouped by self/cross attention. Before: (past_key_self_0, past_value_self_0, past_key_cross_0, past_value_cross_0), (past_key_self_1, past_value_self_1, past_key_cross_1, past_value_cross_1), ... After: (past_key_self_0, past_value_self_0, past_key_self_1, past_value_self_1, ...), (past_key_cross_0, past_value_cross_0, past_key_cross_1, past_value_cross_1, ...) Args: present_key_values: From past_key_values of a model (group by layer) concat: If concat self attention with cross attention key/value to return Returns: present_self (Tuple[torch.Tensor]): present key and values from self attention present_cross (Tuple[torch.Tensor]): present key and values from cross attention r