ust one feature section which gives us better performance and doesn't hurt readability which two separate feature sections did. Moving the andi. to just above the branch doesn't have any noticeable negative effect on the remaining 64bit processors (the ones that didn't have this feature bit added). On Cell this simple modification results in an improvement to measured memcpy() bandwidth of up to 50% in the hot cache case and up to 15% in the cold cache case. On Power6 we get memory bandwidth results that are up to three times faster in the hot cache case and up to 50% faster in the cold cache case. Commit 2a9294369bd020db89bfdf78b84c3615b39a5c84 ("powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ") was where CPU_FTR_CP_USE_DCBTZ was added. To say that Cell gets its best performance in memcpy() with just the destination aligned is true but only for the reason that the indirect shift and rotate instructions, sld and srd, are microcoded on Cell. This means that either the destination or the source can be aligned, but not both, and seeing as we get better performance with the destination aligned we choose this option. While we're at it make a one line change from cmpldi r1,... to cmpldi cr1,... for consistency. Signed-off-by: Mark Nelson Signed-off-by: Paul Mackerras ävD¶™-x