What 8086 instructions accept REP?

Question

I tried this code in my assembler, set to 16 bit mode:

bits 16
rep mov ds, ax

Surprisingly, no error was thrown. Is this even valid? Wasn't rep only supposed to work with string instructions? Is it just a very fancy way to write this?

mov ds, ax
xor cx, cx

So far thsi is a standard x86 programming question, not really retro specific. Any particular reason why it's not asked on Stack Overflow first? — Raffzahn, Jan 31 '21 at 13:59
Because I thought that this is a quirk of the 8086 that doesn't really apply to later processors. I was wrong. — DarkAtom, Jan 31 '21 at 14:04
rep does not set cx to zero when used on a non-string instruction. Even when used on unconditional string instructions (lods, stos, movs) the effect is to repeat until cx zero, but the instruction does not alter the status flags in the way a xor cx, cx would. Say, rep movsw is more similar to jcxz .skip \ .loop: \ movsw \ loop .loop \ .skip: — ecm, Jan 31 '21 at 20:33
To play devils' advocate: the question's fine, because it's asking about the behaviour of retro processors. The fact that Intel still exists is stuff that's happened outside of our site's scope, and so shouldn't affect it. — wizzwizz4, Feb 01 '21 at 21:42
@wizzwizz4 I agree. Just wondering why not asked first on a wider forum. Then again, many comments show that even around here people reflect way more on later developments of the x86 series, than the question really asked: Why does this (unnamed) Assembler swallow that code when set for 8086. A bit disturbing, isn't it? — Raffzahn, Feb 01 '21 at 22:29
@Raffzahn: bits 16 doesn't mean "set for 8086". The whole reason for x86's continued existence is backwards-compatibility: current x86 CPUs support real mode (and 16-bit sub-mode of protected and long modes). bits 16 is appropriate for all of those, and allows usage of MMX/SSE instructions; basically anything that doesn't need a VEX or EVEX prefix, along with encoding 32-bit operand and address size via prefixes. "8086 mode" would be YASM's CPU 8086 or similar instruction-set restriction / checking features in other assemblers, like MASM's .8086 I think. — Peter Cordes, Feb 02 '21 at 11:55
@Raffzahn: I'd be happy if this question were migrated to Stack Overflow, where it might be more appropriate (and could be edited to ask "what x86 instructions accept REP", or "why don't assemblers reject REP on non-string instructions?" which are the questions I thought were more interesting to answer, despite this being retrocomputing. Sorry if I stepped on any toes, but I think my answer has lots of stuff that's interesting; I included it because the OP commented that they only mentioned 8086 because they thought this might be an 8086 quirk. — Peter Cordes, Feb 02 '21 at 12:00
@PeterCordes: My toes are fine; I don’t think it at all useless or boring to have a look at how the interpretation of x86 prefix bytes changed over time and contrast old with new. The question as asked is on-topic here, so I don’t see a need to migrate it. — user3840170, Feb 02 '21 at 13:23

user3840170 · Accepted Answer · 2021-03-14T08:27:39.353

All of them. But it will only have an effect with a select few.

Contrary to what the question implies, the rep prefix is not an orthogonal looping construct that can be combined with any instruction. The 8086 family manual defines the use of rep/repe/repz (0xf3) and repne/repnz (0xf2) prefixes only in conjunction with string instructions, which are movs, cmps, scas, lods and stos; all other uses of those two prefixes are illegal. But since the original 8086 does not have an illegal opcode exception, every instruction necessarily has to do something, even when it is formally undefined. For most instructions, an illegal rep prefix is simply ignored, but this is not always the case; as was relatively recently discovered, using rep on multiplication and division instructions would invert the sign of the result.

While the 8086’s successor, the 80186, added the #UD (illegal opcode) exception, it was apparently triggered only for undefined instructions, not undefined combinations of an otherwise-defined instruction and prefix. Later revisions of the architecture would trigger #UD on undefined uses of the lock prefix, but not rep or repne. It’s not clear to me what Intel’s rationale for this was at the time, although this would prove useful in later revisions of the architecture:

The pause instruction, available since the Pentium 4, is encoded equivalently to rep nop, which means earlier versions of the architecture will simply ignore it, enabling operating systems to use it immediately without worrying about backwards compatibility.
A bug in certain AMD CPUs caused a performance penalty upon encountering a return instruction in certain circumstances. To work it around, the return opcode may be encoded as rep ret instead of a plain ret. This avoids the performance problem on defective CPUs, while other CPUs will simply ignore the prefix.
The Intel MPX instruction set extension for bounds checking consists entirely of opcodes that were reserved and do nothing on (at least some) previous revisions of the architecture which do not implement the extension. This includes a bnd prefix for branch instructions, which is encoded identically to repne, and as such is likewise ignored by CPUs without MPX support.

Nevertheless, undocumented combinations of the prefix with instructions remain undefined, with all the usual consequences of undefined behaviour: they may break on subsequent revisions or alternative implementations of the architecture, they may behave non-deterministically, they may trigger strange behaviours, up to and including naſal dæmons. You are probably better off not using them, unless you have very good reasons to ignore such problems.

As for why the assembler doesn’t stop you from using illegal instructions: historically, assemblers have been pretty lenient in handling coding errors, for reasons both practical and, so to speak, philosophical:

Resource constraints. Back when assembly languages were dominant, CPU clocks ran at best a couple of megahertz, while memory sizes (working memory and exchangeable media, which in that time meant floppies) were at best a couple hundreds of kilobytes, so performance tuning was very important. (It was quite the feat to implement a reasonably performant spell checker back then.) Adding extensive error checks to a program (any program, not just an assembler) incurs costs, considered significant at the time, in running time and code size, not to mention complexity. Naturally, most programmers just didn’t bother, and instead ill-fatedly directed users to read the fabulous manual.
Completeness. An assembly language is supposed to be just a human-readable representation of the architecture’s instruction encoding; as such, it ought to be able to represent any possible instruction, whether is actually supported and makes sense to execute or not. For example, current x86 assemblers are able to encode moves into and out of control registers CR5 through CR7, even though those registers currently have no defined use, no such is in sight either, and the instructions simply raise #UD. (I suspect the registers don’t even exist physically on the chip.)
Forward compatibility. Occasionally, illegal instructions become legal. When that happens, it becomes advantageous to be able to use them without having to upgrade the assembler. If your assembler hasn’t yet added built-in support for the bnd prefix, you can simply write it as repne today without waiting for an updated version. This reason was much more important before Internet-based software distribution became popular, but still remains valid to an extent.
Support for de-facto standards. An opcode that is officially undocumented may nevertheless still exhibit useful behaviour on actual hardware. A programmer who understands the risks and/or is only interested in programming against the particular implementation of the architecture which provides it may choose to use such undefined instructions anyway, despite them being officially unsupported. When such use becomes popular enough, the manufacturer may unofficially maintain the behaviour for the sake compatibility, and assemblers may start including a mnemonic for the opcode as a de-facto standard. Eventually, the opcode may even become officially supported and documented; such for example happened with icebp.
Programmers’ arrogance. There is an enduring stereotype of a low-level programmer who takes pride in doing all the menial tasks of programming by hand and proclaims things like ‘Real Programmers don’t use Pascal’ or ‘how dare the compiler check my array bounds’ or ‘if my program ever has a buffer overflow, it’s the user’s fault because they’re holding it wrong’. (If Raffzahn’s answer is any indication, it’s still a somewhat justified one.) Such a person would probably consider an assembler which checks their errors to be an insult to their programming ability and not consider it an attractive feature.

Somewhat unrelated, the F3H is a mandatory prefix for POPCNT, LZCNT, and ADOX (adds two unsigned integers plus carry, reading the carry from the overflow flag). ADCX (adds two unsigned integers plus carry, reading the carry from the carry flag) introduced with ADOX as Intel ADX, does not require the F3H prefix. Would be interesting to know the reasoning for this. — Single Malt, Feb 01 '21 at 11:06
Maybe Intel wanted to reuse some instruction decoding machinery. Since they have already special-cased pause, that’s just more of the same. — user3840170, Feb 01 '21 at 11:49
@SingleMalt: F3, F2, and 66 have been "mandatory prefixes" for various SSE instructions, like addss vs. addps. It's one of the prefixes that the decoder hardware must know to treat specially as a possible part of an opcode. It's a fairly normal way to invent new coding space even when you don't care about older CPUs silently ignoring the prefix, just like the rep bsr = lzcnt case (where they give different results for the same input, unilke rep bsf = tzcnt for non-zero inputs which can be used transparently for performance on AMD CPUs with slow bsf, fast tzcnt). — Peter Cordes, Feb 01 '21 at 21:56

Peter Cordes · Answer 2 · 2021-02-03T20:15:09.927

You can't repeat arbitrary instructions with rep.
In asm syntax, rep just means to include an F3 byte as a prefix for this instruction. There is no implication that it actually means repeat, it's just shorthand for db 0xF3. Assemblers exist to help you put the bytes you want into an object file. It's up to you to make good choices.

When F3 rep doesn't apply to an opcode, in practice it's ignored¹ (same for other prefixes except lock, in CPUs after 8086). Intel documents the behaviour as "undefined" because future CPUs could use that sequence as a multi-byte opcode that does something different. (e.g. lzcnt = rep bsr, giving different results on CPUs that know about lzcnt vs. older CPUs). So they're not future-proof safe for padding instructions for alignment to avoid NOPs.

Note 1: at least on modern x86 CPUs. As @user3840170's answer points out, rep mul can invert the sign of the result on actual 8086! Perhaps this is why Intel only ever says anything about specific cases, not in general.

From Intel's x86 manuals, vol.2, REP/REPE/REPZ/REPNE/REPNZ entry:

The REP prefix can be added to the INS, OUTS, MOVS, LODS, and STOS instructions, and the REPE, REPNE, REPZ, and REPNZ prefixes can be added to the CMPS and SCAS instructions. (The REPZ and REPNZ prefixes are synonymous forms of the REPE and REPNE prefixes, respectively.) The F3H prefix is defined for the following instructions and undefined for the rest:

F3H as REP/REPE/REPZ for string and input/output instruction.

F3H is a mandatory prefix for POPCNT, LZCNT, and ADOX.

(Intel's list is not a complete, for example omitting TZCNT, and use as part of PAUSE. But perhaps they're intentionally omitting cases where backwards compat by ignoring the REP prefix is useful. e.g. tzcnt = rep bsf same result when the input is non-zero, pause = rep nop, F3 = xrelease HLE prefix, and so on. Also omits mention of F3h as a prefix for some scalar SSE instructions, and of F2h as part of crc32 for example.)

Every time Intel has introduced a new instruction using an F3 (or F2) byte as a mandatory prefix, they've documented the fact that older CPUs ignore the prefix when that's useful. This has allowed transparent Hardware Lock Elision when used with lock add or mov stores, for example, as well as pause safely running as nop on older CPUs. This is possible because CPUs do in practice ignore such prefixes; Intel just chooses not do document it except on a case-by-case basis when it's relevant and useful. (I haven't checked AMD's docs but I assume it's basically the same.)

(Some SSE1/2 scalar instructions like addsd use F2 or F3 as a mandatory prefix; I assume a CPU with SSE1 but not SSE2 would run F2 0F 58 as addps instead of addsd, although this is not documented or useful. F3 0F 58 is SSE1 addss, so there wouldn't be any CPUs that could ignore the F3 and run it as SSE1 addps)

Related Q&As for more detail:

What does “rep; nop;” mean in x86 assembly? Is it the same as the “pause” instruction?
What does rep ret mean? not documented, but a de-facto standard thanks to widespread use by GCC to avoid AMD branch-mispredict penalties for a 1-byte ret following a branch. This is explicitly taking advantage of the "current CPUs ignore rep" behaviour, not as a new instruction.
why do repe and repne do the same before movsb? - Apparently (some?) CPUs run F2 movsb the same as normal F3 rep movsb.
Combining prefixes in SSE - mixing and matching prefixes other than mandatory ones.
Repeat prefixes and mandatory prefixes in x86 - f2 0f 38 f0 is definitely CRC32, not rep movbe, despite its 0f 38 f0 encoding.

F3 to actually repeat something

An F3 byte behaves as a true rep prefix (repeating for [E/R]CX counts) only for the "string"¹ instructions documented in the rep entry in Intel's current vol.2 manual; still only for stos/movs/ins/outs². And somewhat surprisingly also lods, even though just loading usually doesn't have side effects (except in MMIO regions, or faulting on unmapped pages or segment limits on CPUs later than 8086).

The F3 byte can also act as a repe aka repz prefix for cmps and scas, documented in the same page of Intel's manual. In that case, repeating stops if ZF==0 after any step. (As with rep, E/RCX is checked for !=0 before even the first step, but ZF is only checked after the first step so you don't have to create good FLAGS state before using, unless E/RCX might be zero initially and you want to branch on ZF after.) repne aka repnz is an F2 byte, and only applies to those two instructions as a repeat with the opposite ZF condition.

Note 1: In C terms, rep/repe implements memset/memcpy/memcmp/memchr, not str* for 0-terminated C strings. These are "explicit-length string" instructions, which I think was common at the time 8086 was new, and is now back in favour with C++ std::string unlike C char*.

Note 2: insb/w and outsb/w were new in 186.

Note that only rep movs and rep stos are fast on modern CPUs, with optimized microcode that copies or stores in chunks of 16, 32, or maybe even 64 bytes at a time. (SIMD loops can still be better). repe cmpsb is dog slow (e.g. one compare per 2 cycles on Skylake, 3 on Zen) and easily beaten with even SSE2 pcmpeqb / pmovmskb to implement memcmp. Or even scalar bithacks for strlen / strcmp.

8086 design bug: when a rep instruction with multiple prefixes is interrupted, the saved IP points at the last prefix. So cs rep movsb will resume as rep movsb without a CS segment override!

Later x86 CPUs fixed this, but on 8086 / 8088 you either avoid that entirely, or write it as rep cs movsb inside a loop that compensates. So if it's interrupted, on resume it runs one cs movsb (without decrementing CX because the rep got skipped). You can detect the non-zero CX and either re-calculate from the pointers or work out some logic to get CX correct and jump back to before the rep prefix.

Fun fact #2: when single-stepping (with TF=1, the trap flag), rep-string instructions trap after every count.

Fun fact #3: Per discussion in comments on @Raffzahn's answer, 8086 CPUs assert some external queue-status signals every time they decode a prefix or an opcode. Prefixes clearly behave differently from instructions in terms of software-visible behaviour, so this appears to be an 8086 implementation detail. Prefixes aren't truly "opcodes", despite being documented in the manual along with instructions. (rep and lock still are; segment overrides don't have their own entry in Intel's vol.2 manual. Other prefixes like xacquire (F2) and xrelease (F3) also have entries)

Pascal uses explicit-length strings, and seems to have played a role in the design of the 8086. — Stephen Kitt, Feb 01 '21 at 23:08
@StephenKitt: Yes, exactly the kind of popular-at-the-time language I had in mind, before the rise of C's popularity. (Or maybe just that specific language, as you say. e.g. the 186 enter instruction with a non-zero first operand handles the complexity of stack frames for nested functions, also a thing Pascal wants but not C.) Pascal was popular at the time, so hard to say whether Pascal specifically was the motivation here, or explicit-length strings in general. Certainly relevant for hand-written asm dealing with arrays / buffers. — Peter Cordes, Feb 01 '21 at 23:12
This answer gives me the bizarre impression that it is simultaneously clearer and messier than my own. On one hand, I like how it distinguishes between the rep mnemonic and the prefix byte it generates; in hindsight, I should have done the same. On the other, it’s over-formatted, tangents are all over the place, and the focus on later generations of x86 makes it borderline ahistorical. (The question was asked about the 8086 after all.) — user3840170, Feb 02 '21 at 09:29
@user3840170: The OP mentioned in comments that the only reason for asking on retrocomputing was they thought it might have been an 8086 or bits 16 quirk, not that they specifically cared about 8086. They're clearly using a modern assembler (otherwise there'd be no support for a bits directive because 386 introduce that.) I thought the OP and others might be interested in the modern-x86 tangents which I felt like writing down somewhere. I think I partly wanted to expand on things in your answer and comments, and didn't spend too long coming up with a coherent order to present in :/ — Peter Cordes, Feb 02 '21 at 09:43
@user3840170: Can you be more specific about what you mean by over-formatted? For skimmability I think it's generally good to have distinct sections with a topic sentence or key point bolded. It's certainly somewhat "busy", but with multiple different points to make, that's inevitable. Is that what you mean? Do you have any specific suggestion for changes? — Peter Cordes, Feb 02 '21 at 09:47
@Peter Cordes yes Pascal was very likely the inspiration for these instructions. If you look the 2 parameters of the ENTER instruction, it was specifically designed that way to allow the nested functions the language allows. There is also the BOUND instruction that hints strongly to Pascal as it has array bound checking by default, which C does not support at all. — Patrick Schlüter, Feb 02 '21 at 09:53
@PatrickSchlüter: As I commented earlier, enter was new in 186, not 8086. So was bound (https://ulukai.org/ecm/insref.htm#i48). This alone is weak evidence for 8086 itself being Pascal-inspired. I think there's some historical evidence or documentation that it is/was, though. But thanks for confirming that the way enter works matches up well with Pascal nested functions specifically. — Peter Cordes, Feb 02 '21 at 10:01
@PeterCordes: Mainly, repeatedly bolding a sentence-long passage per paragraph is about as grating as overuse of exclamation marks. My rule of thumb is at most three bolded words per paragraph, and at most one whole bolded sentence per chapter. (And footnotes go at the bottom, as the name implies.) — user3840170, Feb 02 '21 at 12:57
"As with rep, E/RCX is checked for !=0 before even the first step, but ZF is only checked after the first step so you don't have to create good FLAGS state before using." -- Actually, while rep(n)e will repeat regardless of the initial ZF, there is a case where you need to "create good ZF state before using". That's if the input (r/e)cx may be zero. If the counter is zero then the ZF state is left unaltered from what it was before the prefixed cmps/scas instruction. It's needed to initialise ZF before then. — ecm, Feb 03 '21 at 13:34
@Peter Cordes: Better link to the insref on bound is https://ulukai.org/ecm/doc/insref.htm#insBOUND (doc/ subdirectory and the anchor name from the instruction name) — ecm, Feb 03 '21 at 13:36
@PeterCordes: The RET N instruction, present on the 8088/8086, was used by Pascal compilers on the PC, and I think that instruction clearly exists for the purpose of supporting that language. As for why other string instructions aren't designed around zero-terminated strings, that's because zero-terminated strings are only really appropriate in a few narrow situations for which the only useful string instruction would be SCASB with AL==0, which in fact works just fine. — supercat, Feb 03 '21 at 17:28
@ecm: Thanks for the tip on anchor names, updated INSB in my answer. Can't of course fix bound in my comment. Also added stuff about ZF state for the CX=0 possibility, thanks. — Peter Cordes, Feb 03 '21 at 19:23
@user3840170: Good point about "footnote"; I like them where they are so I changed them to "notes". In a real printed book or paper, footnotes would be at the bottom of each page, within eye-glance of the pointer to them, as opposed to end-notes. Sometimes they're not really optional reading, just expanding to more exactness on something that I glossed over in an earlier paragraph. So I want them there in that order, and I use a superscript note instead of a parenthetical to let the reader know when (for example) I'm over-simplifying something to the point of being wrong in some cases. — Peter Cordes, Feb 03 '21 at 19:27
"So cs rep movsb will result as rep movsb with a CS segment override!" Wording mistake, should read "without a". — ecm, Feb 03 '21 at 20:05
@ecm: Thanks, also meant to write "resume" instead of "result" :P — Peter Cordes, Feb 03 '21 at 20:15

Raffzahn · Answer 3 · 2021-01-31T14:12:04.393

[You might want to add a bit more information to start with - like what assembler you're using, or what kind of 'Error thrown' you expect. From assembler? Linker? Debugger? Or some OS/Runtime? Otherwise it's hard to give any definite answer.]

In general REP is simply an opcode that can come at any point like any other opcode. It isn't invalid on its own. Now, when it comes to usefulness it might make more sense in front of any string operation.

So maybe answer for yourself, why do you expect to be stopped from using it - and by what component?

The underlaying issue might be more about the assembler used. In general an assembler is meant to allow the programmer to generate any code he wants - even illegal. And while one might appreciate some warning, this case is not illegal in any way. Beside, there is no way the assembler can guess if that instruction is meant as code to be executed or some data, for example used to synthesize instructions during run time. I'd delete any assembler that would stop me from programming right the moment he tries to :)

Comments are not for extended discussion; this conversation has been moved to chat. — wizzwizz4, Feb 02 '21 at 14:03

What 8086 instructions accept REP?

3 Answers3

F3 to actually repeat something