with sampling. Refer the Megatron-LM repo for more details Args: inputs (torch.Tensor): input ids attention_mask (torch.Tensor, optional): attention mask. Defaults to None. max_length (int, optional): max length of the generated sequence. Defaults to None. Either this or max_new_tokens should be provided. max_new_tokens (int, optional): max number of tokens to be generated. Defaults to None. Either this or max_length should be provided. num_beams (int, optional): number of beams to use for beam search. Defaults to None. temperature (float, optional): temperature for sampling. Defaults to 1.0. top_k (int, optional): top k tokens to consider for sampling. Defaults to 0.0. top_p (float, optional): tokens in top p probability are considered for sampling. Defaults to 0.0. length_penalty (float, optional): length penalty for beam search. Defaults to None. kwargs: additional key-value arguments rC