unt 2 1 2 :: Min 1.8610 3.7240 2 :: Mean 3.4325 3.7240 2 :: Max 5.0040 3.7240 2 :: Stddev 1.5715 0.0000 9 :: Count 111696 111694 9 :: Min 0.5230 0.4110 9 :: Mean 10.5831 10.5718 9 :: Max 38.4480 43.2900 9 :: Stddev 1.1147 1.1325 Mean time for order-1 allocations is reduced. order-2 looks increased but with so few allocations, it's not particularly significant. THP mean allocation latency is also reduced. That said, allocation time varies so significantly that the reductions are within noise. Max allocation time is reduced by a significant amount for low-order allocations but reduced for THP allocations which presumably are now breaking before reclaim has done enough work. SysBench Highorder Allocation Latency Statistics fix-infinite break-early 1 :: Count 15745 15677 1 :: Min 0.4250 0.4550 1 :: Mean 1.1023 1.0810 1 :: Max 14.4590 10.8220 1 :: Stddev 0.5117 0.5100 2 :: Count 1 1 2 :: Min 3.0040 2.1530 2 :: Mean 3.0040 2.1530 2 :: Max 3.0040 2.1530 2 :: Stddev 0.0000 0.0000 9 :: Count 2017 1931 9 :: Min 0.4980 0.7480 9 :: Mean 10.4717 10.3840 9 :: Max 24.9460 26.2500 9 :: Stddev 1.1726 1.1966 Again, mean time for order-1 allocations is reduced while order-2 allocations are too few to draw conclusions from. The mean time for THP allocations is also slightly reduced albeit the reductions are within varianes. Once again, our maximum allocation time is significantly reduced for low-order allocations and slightly increased for THP allocations. Anon stream mmap reference Highorder Allocation Latency Statistics 1 :: Count 1376 1790 1 :: Min 0.4940 0.5010 1 :: Mean 1.0289 0.9732 1 :: Max 6.2670 4.2540 1 :: Stddev 0.4142 0.2785 2 :: Count 1 - 2 :: Min 1.9060 - 2 :: Mean 1.9060 - 2 :: Max 1.9060 - 2 :: Stddev 0.0000 - 9 :: Count 11266 11257 9 :: Min 0.4990 0.4940 9 :: Mean 27250.4669 24256.1919 9 :: Max 11439211.0000 6008885.0000 9 :: Stddev 226427.4624 186298.1430 This benchmark creates one thread per CPU which references an amount of anonymous memory 1.5 times the size of physical RAM. This pounds swap quite heavily and is intended to exercise THP a bit. Mean allocation time for order-1 is reduced as before. It's also reduced for THP allocations but the variations here are pretty massive due to swap. As before, maximum allocation times are significantly reduced. Overall, the patch reduces the mean and maximum allocation latencies for the smaller high-order allocations. This was with Slab configured so it would be expected to be more significant with Slub which uses these size allocations more aggressively. The mean allocation times for THP allocations are also slightly reduced. The maximum latency was slightly increased as predicted by the comments due to reclaim/compaction breaking early. However, workloads care more about the latency of lower-order allocations than THP so it's an acceptable trade-off. Signed-off-by: Mel Gorman Acked-by: Andrea Arcangeli Acked-by: Johannes Weiner Reviewed-by: Minchan Kim Acked-by: Andrea Arcangeli Acked-by: Rik van Riel Cc: Michal Hocko Cc: Kent Overstreet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds >0Ex