nd finally we transpose the dim 0 and dim 2 again. 6. If max_norm is specified, we manually sum up the norm and renorm. Because the renorm must be in place, we need to override the local_shard to mimic this behavior. r