ce in the code.) IMPORTANT: Differences in "detach" calls between the original forward and the recomputation are expected. They are introduced by the checkpointing mechanism and can be ignored. Operations executed during the original forward: {forward_ops} Operations executed during recomputation: {recompute_ops} +------------------------------------------------------------------------------+ ERROR: Detected non-determinism while running activation checkpointing You are seeing this error because you passed `debug=True` to checkpoint and tensors to be saved during the original forward and differ between those saved during recomputation. This can happen if different operators were ran in the original forward and in the recomputation. To identify where the mismatch may be coming from, you can do the following: 1) Compare the operators ran during original forward and recomputation to see where they differ. These operators are printed above in the order they were executed. 2) Review the stack trace for each operator to locate its invocation source. Each operator's stack trace is printed in their execution order. Note that the logs can be quite long. Here's how they are structured: (Tip: you can Ctrl-f for these headers) 1. Stack traces of the operators that ran in the original forward 2. Stack traces of the operators that ran during recomputation 3. Log of operators in the original forward and recomputation 4. Error message <--- You are here -------------------------------------------------------------------------------- c