You can't repeat arbitrary instructions with rep.
In asm syntax, rep just means to include an F3 byte as a prefix for this instruction. There is no implication that it actually means repeat, it's just shorthand for db 0xF3. Assemblers exist to help you put the bytes you want into an object file. It's up to you to make good choices.
When F3 rep doesn't apply to an opcode, in practice it's ignored1 (same for other prefixes except lock, in CPUs after 8086). Intel documents the behaviour as "undefined" because future CPUs could use that sequence as a multi-byte opcode that does something different. (e.g. lzcnt = rep bsr, giving different results on CPUs that know about lzcnt vs. older CPUs). So they're not future-proof safe for padding instructions for alignment to avoid NOPs.
Note 1: at least on modern x86 CPUs. As @user3840170's answer points out, rep mul can invert the sign of the result on actual 8086! Perhaps this is why Intel only ever says anything about specific cases, not in general.
From Intel's x86 manuals, vol.2, REP/REPE/REPZ/REPNE/REPNZ entry:
The REP prefix can be added to the INS, OUTS, MOVS, LODS, and STOS instructions, and the REPE, REPNE, REPZ, and REPNZ prefixes can be added to the CMPS and SCAS instructions. (The REPZ and REPNZ prefixes are synonymous forms of the REPE and REPNE prefixes, respectively.) The F3H prefix is defined for the following instructions and undefined for the rest:
- F3H as REP/REPE/REPZ for string and input/output instruction.
- F3H is a mandatory prefix for POPCNT, LZCNT, and ADOX.
(Intel's list is not a complete, for example omitting TZCNT, and use as part of PAUSE. But perhaps they're intentionally omitting cases where backwards compat by ignoring the REP prefix is useful. e.g. tzcnt = rep bsf same result when the input is non-zero, pause = rep nop, F3 = xrelease HLE prefix, and so on. Also omits mention of F3h as a prefix for some scalar SSE instructions, and of F2h as part of crc32 for example.)
Every time Intel has introduced a new instruction using an F3 (or F2) byte as a mandatory prefix, they've documented the fact that older CPUs ignore the prefix when that's useful. This has allowed transparent Hardware Lock Elision when used with lock add or mov stores, for example, as well as pause safely running as nop on older CPUs. This is possible because CPUs do in practice ignore such prefixes; Intel just chooses not do document it except on a case-by-case basis when it's relevant and useful. (I haven't checked AMD's docs but I assume it's basically the same.)
(Some SSE1/2 scalar instructions like addsd use F2 or F3 as a mandatory prefix; I assume a CPU with SSE1 but not SSE2 would run F2 0F 58 as addps instead of addsd, although this is not documented or useful. F3 0F 58 is SSE1 addss, so there wouldn't be any CPUs that could ignore the F3 and run it as SSE1 addps)
Related Q&As for more detail:
F3 to actually repeat something
An F3 byte behaves as a true rep prefix (repeating for [E/R]CX counts) only for the "string"1 instructions documented in the rep entry in Intel's current vol.2 manual; still only for stos/movs/ins/outs2. And somewhat surprisingly also lods, even though just loading usually doesn't have side effects (except in MMIO regions, or faulting on unmapped pages or segment limits on CPUs later than 8086).
The F3 byte can also act as a repe aka repz prefix for cmps and scas, documented in the same page of Intel's manual. In that case, repeating stops if ZF==0 after any step. (As with rep, E/RCX is checked for !=0 before even the first step, but ZF is only checked after the first step so you don't have to create good FLAGS state before using, unless E/RCX might be zero initially and you want to branch on ZF after.) repne aka repnz is an F2 byte, and only applies to those two instructions as a repeat with the opposite ZF condition.
Note 1: In C terms, rep/repe implements memset/memcpy/memcmp/memchr, not str* for 0-terminated C strings. These are "explicit-length string" instructions, which I think was common at the time 8086 was new, and is now back in favour with C++ std::string unlike C char*.
Note 2: insb/w and outsb/w were new in 186.
Note that only rep movs and rep stos are fast on modern CPUs, with optimized microcode that copies or stores in chunks of 16, 32, or maybe even 64 bytes at a time. (SIMD loops can still be better). repe cmpsb is dog slow (e.g. one compare per 2 cycles on Skylake, 3 on Zen) and easily beaten with even SSE2 pcmpeqb / pmovmskb to implement memcmp. Or even scalar bithacks for strlen / strcmp.
8086 design bug: when a rep instruction with multiple prefixes is interrupted, the saved IP points at the last prefix. So cs rep movsb will resume as rep movsb without a CS segment override!
Later x86 CPUs fixed this, but on 8086 / 8088 you either avoid that entirely, or write it as rep cs movsb inside a loop that compensates. So if it's interrupted, on resume it runs one cs movsb (without decrementing CX because the rep got skipped). You can detect the non-zero CX and either re-calculate from the pointers or work out some logic to get CX correct and jump back to before the rep prefix.
Fun fact #2: when single-stepping (with TF=1, the trap flag), rep-string instructions trap after every count.
Fun fact #3: Per discussion in comments on @Raffzahn's answer, 8086 CPUs assert some external queue-status signals every time they decode a prefix or an opcode. Prefixes clearly behave differently from instructions in terms of software-visible behaviour, so this appears to be an 8086 implementation detail. Prefixes aren't truly "opcodes", despite being documented in the manual along with instructions. (rep and lock still are; segment overrides don't have their own entry in Intel's vol.2 manual.
Other prefixes like xacquire (F2) and xrelease (F3) also have entries)
repdoes not setcxto zero when used on a non-string instruction. Even when used on unconditional string instructions (lods, stos, movs) the effect is to repeat untilcxzero, but the instruction does not alter the status flags in the way axor cx, cxwould. Say,rep movswis more similar tojcxz .skip\.loop:\movsw\loop .loop\.skip:– ecm Jan 31 '21 at 20:33bits 16doesn't mean "set for 8086". The whole reason for x86's continued existence is backwards-compatibility: current x86 CPUs support real mode (and 16-bit sub-mode of protected and long modes).bits 16is appropriate for all of those, and allows usage of MMX/SSE instructions; basically anything that doesn't need a VEX or EVEX prefix, along with encoding 32-bit operand and address size via prefixes. "8086 mode" would be YASM'sCPU 8086or similar instruction-set restriction / checking features in other assemblers, like MASM's.8086I think. – Peter Cordes Feb 02 '21 at 11:55