ng of the sequence by setting `cur_len = 0` (number of tokens processed so far). 1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked) 2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be masked 3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length` 4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the sequence to be processed), repeat from Step 1. r