that can be
done.  But for 4.1 and newer, there is a 'Z' constraint.  It does not allow
"updating" addresses, but does allow both indexed and offset addresses.
However, the only allowed constant offset is 0.  We can then use the
undocumented 'y' operand modifier, which causes gcc to convert "0(reg)"
into the equivilient "0,reg" format that can be used with stwbrx.

This brings us the to problem with the BE version.  In this case, the "stw"
instruction does have both indexed and non-indexed versions.  The final asm
ends up looking like this:
asm("sync; stw%U0%X0 %1,%0" : "=m" (*addr) : "r" (val), "r" (addr));

The undocumented codes "%U0" and "%0X" will generate a 'u' if the memory
reference should be an auto-updating one, and an 'x' if the memory
reference is indexed, respectively.  The third operand is unused, it's just
there because asm the code is reused from the LE version.  However, gcc
does not know this, and generates unnecessary code to stick addr in a
register!  To use the example from the LE version, gcc will generate "add
11,4,3; stwx 9,4,3".  It is able to use the indexed address "4,3" for the
"stwx", but still thinks it needs to put 4+3 into register 11, which will
never be used.

This also ends up happening a lot for the offset addressing mode, where
common code like this:  out_be32(&device_registers->some_register, data);
uses an instruction like "stw 9, 42(3)", where register 3 has the pointer
device_registers and 42 is the offset of some_register in that structure.
gcc will be forced to generate the unnecessary instruction "addi 11, 3, 42"
to put the address into a single (unused) register.

The in_* versions end up having these exact same problems as well.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
	Ë