clear improvement from this: (together with the earlier patches in this series, especially the split first 2MB patch) [lower is better] no split stddev split stddev delta Elapsed Time 87.146 (0.727516) 84.296 (1.09098) -3.2% User Time 274.537 (4.05226) 273.692 (3.34344) -0.3% System Time 34.907 (0.42492) 34.508 (0.26832) -1.1% Percent CPU 322.5 (38.3007) 326.5 (44.5128) +1.2% => About 3.2% improvement in elapsed time for kernbench. With GB pages on AMD Fam1h the impact of splitting is much higher of course, since it would split two full GB pages (together with the first 1MB split patch) instead of two 2MB pages. I could not benchmark a clear difference in kernbench on gbpages, so I kept it disabled for that case That was only limited benchmarking of course, so if someone was interested in running more tests for the gbpages case that could be revisited (contributions welcome) I didn't bother implementing this for 32bit because it is very unlikely the 32bit lowmem mapping overlaps into the TSEG near 4GB and the 2MB low split is already handled for both. [ mingo@elte.hu: do it on gbpages kernels too, there's no clear reason why it shouldnt help there. ] Signed-off-by: Andi Kleen Acked-by: andreas.herrmann3@amd.com Cc: mingo@elte.hu Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar ^ρΣԐ&x