Is LEA valid with a negative scale, or SUB with a scaled register?

Question

I have two registers mapped like this to a variable

%rdi = x, %rsi = y

I want to make y = y - 4x

My trial goes like this. I subtracted x four times to get y - 4x.

subq %rdi, %rsi         # y = y - x
subq %rdi, %rsi         # y = y - 2x
subq %rdi, %rsi         # y = y - 3x
subq %rdi, %rsi         # y = y - 4x

But I want to finish with just one line of code.

I thought about using leaq in this situation.
leaq (%rsi, %rdi, 4), %rsi # y = y + 4x

But as you see, now I'm adding 4x instead of subtracting 4x.

So my question is,

leaq (%rsi, %rdi, -4), %rsi
would this be a vaild operation? I know the scaling factor has to be a power of 2 like 1,2,4... but can it be a negative number like -4?
subq -4 * %rdi, %rsi
would this also be valid? I want to know if I could perform a multiplication to a register value directly.

No, any assembler would have told you those don't assemble. And [Intel's manual for SUB](https://www.felixcloutier.com/x86/sub) could tell you why: there operand can only be register or memory, not a scaled-register. This isn't ARM with a barrel-shifter for source operands. You can of course do it in 2 instructions with a temporary register, shift or LEA then sub. — Peter Cordes, Apr 11 '22 at 19:49
As for the negative scale in an addressing mode, unfortunately no on that as well, there's only a 2-bit shift count in machine code. [A couple of questions about \[base + index\*scale + disp\]]([A couple of questions about \[base + index\*scale + disp\]](https://stackoverflow.com/q/27936196)) Basically a duplicate of that, except for the question about `sub`. — Peter Cordes, Apr 11 '22 at 19:50

Brendan · Answer 1 · 2022-04-11T21:35:46.860

I know the scaling factor has to be a power of 2 like 1,2,4... but can it be a negative number like -4?

No. In machine code the scale is encoded as 2 bits (in a "SIB" byte, which also has 3 bits to determine the index register and 3 bit to determine the base register - there's no unused bits). The 2 scale bits are interpreted as a shift count (like << 0, << 1, << 2, << 3) which is essentially the same as a multiplier (like * 1, * 2, * 4, * 8).

I want to know if I could perform a multiplication to a register value directly.

The scale can't be used directly on a register. This kind of complexity was added to make accessing arrays fast/easy; and can only be used to determine an address (either for loading an address with the lea instruction, or for accessing memory).

There are other instructions that do a multiplication to a register value though (shifts, mul, imul, and aad for general/integer registers, plus more for floating point and SIMD registers). If you want to do a "fused multiply and add" then your choices are aad (which is 8 bits where the multiplier has to be a constant), lea, and SIMD extensions.

subq -4 * %rdi, %rsi

In theory; it would be possible to invent a new assembly language for 80x86 machine code (in the same way that AT&T invented an assembly language for 80x86 that doesn't match Intel's assembly language); where the assembler can do whatever it likes (as long as it can convert its idea of "instructions" into machine language).

For example, a new assembly language could accept movq %rsi+%rdi*4, %rsi (and convert it to the machine code for leaq (%rsi, %rdi, 4), %rsi). In the same way, a new assembler could accept subq -4 * %rdi, %rsi (and convert it to the machine code for leaq (%rsi, %rdi, 4), %rsi) as long as no later instruction depends on the flags.

If you allow complex expressions that need to expand to multiple instructions, you now have a compiler, not an assembler, for a new language. You could *maybe* still call it an assembler if you limit it to expressions that can be done efficiently without a temporary register. With a `+4` for sub, it would be borderline: `subq 4 * %rdi, %rsi` could have to expand to `neg %rdi` / `lea` / `neg %rdi` (or double-`neg` the destination register, for even worth critical-path latency). Worse than mov/neg into a temporary, and much worse than if you can destroy the source input with just shl/sub. — Peter Cordes, Apr 12 '22 at 01:18
But good point that `rsi -= -4 * rdi` is just `rsi += rdi*4` and can be done with a single LEA except for FLAGS; the two hypothetical instructions in the question aren't equivalent! (I'd assumed they were meant to be and didn't look further.) — Peter Cordes, Apr 12 '22 at 01:21

Is LEA valid with a negative scale, or SUB with a scaled register?

1 Answers1