Z80 string iteration failing

Question

First Z80 project after 6502, a simple looking piece of code giving me hard time.

print_char is a working function which prints contents of A to the screen.

My first issue is that the following code (and many variations of it) gives "index offset out of bound" error with VASM. Any idea on how assembler checks index offset on this occasion and picks up an error ?

Is there a better way to print a string ? On 6502 LDA was checking for affecting zero flag, this is not the case with Z80, so I am trying to or 0 and checking result against zero, but then using B register as or changes A. Doesn't look very elegant.

loop:
    ld A,(IY+msg)
    ld b,a
    or 0
    jp z, end
    ld a,b
    call print_char
    inc IY
    jp loop
 end:
    halt
msg: .asciiz "Hello World"

Raffzahn · Accepted Answer · 2021-03-25T10:49:32.597

13

Since it's two questions, here are two answers:

Adressing Issue

It's a very BASIC 65xx to x80 transition error: Index Registers

Where 65xx CPUs use a 16 it base address (from memory) and an 8 bit index (from register), the x80s use 16 bit index register(s) and an (optional) 8 bit offset. The address of msg should be loaded into IY first (LD IY,msg) and then this register can simply be used to address the bytes to be loaded (LD A,(IY)).

Historically the x80s simply use one memory pointer, in classic 8008/8080 notation called M, residing in the register pair HL. All memory access that is not immediate or absolute had to go thru this. So what's a LD A,(HL)on a Z80 would be written MOV A,M for 8080 (*1). DE could be used to hold a second pointer to be exchanged with HL when needed. The Z80 did relax this a bit by adding IX/IY.

All addressing is done using such a pointer (and optionally on the Z80 a short offset). Similar the 6800 where the generic 16 bit index register IX was to be used.

When the 6500 series was designed by former Motorola engineers, they flipped the concept by putting the base address not in a register but in memory and adding a (short) 8 bit register as index. They even marketed it as an advantage, being a 'true' index now. Well, I guess that's debatable.

So, switching from 65xx programming to x80 programming, also includes the switch between the base address held in a register or in memory.

For all practical purpose, think of HL, IX, IY on the Z80 as pointers like in C. Just have them point somewhere, increment or decrement them, add or subtract a value and that's it. No complex addressing. That's as well the reason C compilers are much less work to be implemented on an x80 or 68xx CPU than on a 65xx. They fit the simple abstraction C makes of a CPU way better than the fine tuned 65xx structure.

*1 - Which is a way to avoid complex notation. For the 8008 it was even less ambiguous as LAM - for more history see this answer

Code Optimization

Is there a better way to print a string?

Depends on your environment.

On 6502 LDA was checking for affecting zero flag, this is not the case with Z80, so I am trying to or 0 and checking result against zero, but then using B register as or changes A. Doesn't look very elegant.

Not really sure why you're using B and OR 0 (*2). The most simple way would be simply ORing A to A as in OR A. Next HL might be more performant, as it doesn't need a prefix. In general, most Z80 features come with a price of a code prefix overhead and thus slower execution. So preferred pointer register is always HL, seconded by DE.

A straightened code might look like this:

    LD   HL,msg 
loop:
    LD   A,(HL)
    OR   A
    JP   Z, end
    CALL print_char
    INC  HL
    JP   LOOP
end:
    HALT
msg: .asciiz "Hello World"

This code is as well clean 8080 code, so will run on any x80 CPU.

Then again, and in this the Z80 is still better than C assumes, it allows to count and jump in one instruction, which makes length terminated strings way more applicable than zero terminated (*3). So a real Z80 like string handling would be like this:

    LD   HL,msg
    LD   B,(HL)
loop:
    INC  HL
    LD   A,(HL)
    CALL print_char
    DJNZ loop
end:
    HALT
msg .byte 11,"Hello World"

Now the inner loop is only 4 instructions and as fast as it can get.

In fact, this is another 65xx to Z80 mentality step to be taken. Zero delimited strings come (somewhat) natural on 65xx (*3), as they set the flags accordingly to a character loaded, while Z80 works best with length delimited, due the implied usage of B as counter register.

I'd strongly recommended doing that step and go full Z80 on this.

*2 - Here lies another benefit/pitfall when moving between these CPUs - the 8080 is way more orthogonal implementing all register to register variations than the 6502, including otherwise strange ones like ORing or ANDing A with A.

*3 - Which they are anyway. Length terminated strings do not have issues with any embedded data.

*4 - As inherited from its real ancestor 6800 and spiritual ancestor PDP.

edited Mar 25 '21 at 10:49

answered Mar 24 '21 at 15:17

Raffzahn

222,541
22
631
918

Thanks a lot for the explanation, it makes sense now. And yes, OR A is way more elegant. That mess with B was me trying to preserve A's value while doing OR 0, naturally OR A removes this need. – Charles Mar 24 '21 at 16:10
@Charles Oh, yes, that's another benefit/pitfall when moving between these CPUs - the 8080 is way more orthogonal implementing all register to register variations than the 6502. – Raffzahn Mar 24 '21 at 16:16
@Raffzahn: On the flip side, the 6502 is more orthogonal with regard to the set of addressing modes that are useful with instructions like lda, adc, ora, etc. – supercat Mar 24 '21 at 19:17
1

Another trick is to have a function begin and end with ex hl,(sp), in which case the message must immediately follow the call to the print function, but the calling code wouldn't need to load HL. – supercat Mar 24 '21 at 19:19
1

Now that strings are length-encoded, a natural step woud be automatic length assignment through some type of macro, otherwise it is way too error-prone. – lvd Mar 25 '21 at 08:12
@lvd Of course. something like <label> .byte <endlabel>-<label>,"Hello World",<endlabel> though, implementation depends on the assembler. – Raffzahn Mar 25 '21 at 10:48
LD A,(HL) on a Z80 is actually MOV M,A on 8080 (destination second, not first). – Toby Speight Mar 25 '21 at 12:55
@TobySpeight Not in my book (see page 4-4). Intel always used a destination first notation. – Raffzahn Mar 25 '21 at 16:12
1

My memory is evidently failing - I was sure I had to deal with that when using an Intel assembler, having learnt Zilog first. It must have been 6800 that's the other way around... – Toby Speight Mar 25 '21 at 16:25
And if you happen to target the gameboy processor, that is somewhere inbetween the 8080 and the Z80 feature-wise, you even get LD A,(HL+) which is like LODSB on the 8086: It loads the byte from (HL) to A, and then increments HL, with a single opcode byte. But as you don't have DJNZ on that CPU, you need to split that instruction into DEC B and JR NZ, loop will lose the byte there. – Michael Karcher Mar 25 '21 at 16:39
1

@TobySpeight Well, we all grow oder ... err .. wiser :)) 6800 uses a single operand syntax (like 6502, or 68k). What uses that wired s,d syntax is the GNU Assembler, as it grew out of an AT&T assembler. Until recently it forced this as well onto x86 programmers. Maybe that's where you picked it up. – Raffzahn Mar 25 '21 at 16:43
@Raffzahn that's still lots of ringing and the need to assign TWO labels per block of text, and that should be hidden in the macro. Either single macro definition or (for multi-line text) starting and finishing macro... And then macros depend on assembler they are written for. So I'd prefer zero-terminated strings despite maybe 1 extra instruction in a loop (and then I'll inline printing code saving CALL/RET and 27 T-states). And you have off-by-one error in your example too :) – lvd Mar 25 '21 at 17:07
@lvd Ofc it has to be a macro, except I didn't want to look into whatever assembler he uses here. And yes, one may find many reasons to use whatever one likes, still I belive assembler is about doing it as the CPU is designed, use what is provided, as intended. Not to mention the pitfalls of delimiter terminated memory areas. Never a good idea, not even on a 65xx, were zero termination comes natural. It's the foremost reason for buffer overflow errors. Last but not least, Every assembler needs adaption, even for otherwise seemingly simple things like defining static strings. So a Macro it is. – Raffzahn Mar 25 '21 at 17:16
The nice thing about NUL-terminated strings is that they can be as long as you have memory for. With byte-length-encoding, strings are limited to 255 chars max, and you can have problems trying to concatenate two legal strings (i.e. if the sum of the lengths >= 256 chars) which you don't have with NUL termination. Pros and cons to each approach, of course. – user7761803 Mar 26 '21 at 13:27
@Raffzahn: Zero-terminated strings are fine and safe when used for the specific purpose of performing some action sequentially on all of the characters of a hard-coded string. That's a common enough use case that it's worthwhile to have functions for the purpose of doing things like outputting all the bytes of a zero terminated string, despite the fact that they're a rubbish format for most all other purposes. – supercat Mar 26 '21 at 17:17
@user7761803 The natural way is to use a word for length - which on an 8 bit system happens to be a byte, as that's what the counter register B can hold. Using BC is not supported by DJNZ. There is a reason that most 8 bit BASIC restricted string length to 255 chars. Last but not least, the main problem with concatenation isn't a limit of 255 (or any other), but that by principle a large enough buffer f up to twice maximum string length is needed. 255 is a very fine size, good to hold a single line of text for all common usage. No need to add lots of management for rare cases. – Raffzahn Mar 26 '21 at 17:40
@user7761803 Quintessence of a ive in that area and from single chip to mainframe programming: It's always better (even short term) to use container like types, that do not care about their content than having to analyze content all the time (one of the reasons why string processing in C is slow). All handling can be made abstract and avoid pitfalls due unexpected data. – Raffzahn Mar 26 '21 at 17:43

Tommy · Answer 2 · 2021-03-24T15:22:39.077

9

LD A, (IY+msg) looks fishy; on the Z80 IX and IY are 16-bit registers and the in-opcode offset is 8-bit. Sort of the opposite of absolute indexed addressing mode on the 6502. So you'd idiomatically load IY with the address of msg and then ld A,(IY+0). And if you're not using the offset, you might then consider (HL) instead for a more compact and faster program.

So I'm guessing that your code fails because msg sits at least partly outside of the address range [-128, 127] of whatever you seeded IX with.

edited Mar 24 '21 at 15:22

answered Mar 24 '21 at 15:06

Tommy

36,843
2
124
171

4

An addition - the 8-bit offset is signed value. – Vlad Mar 24 '21 at 15:09
1

@Vlad ugh, you're right, I'd forgotten about that. It affects my second paragraph, so: edited. Thanks! – Tommy Mar 24 '21 at 15:23
Moving msg closer to the index resolved compiler error, thank you ! – Charles Mar 24 '21 at 16:07

score 4 · Answer 3 · answered Mar 24 '21 at 19:46

4

A handy way of outputting a string on the Z80 is to use something like:

primm:
  ex (sp),hl
primmlp:
  ld a,(hl)
  call putchar
  inc hl
  ld a,(hl)
  or a
  jr nz,primmlp
  ex (sp),hl
  ret

The message to be output must immediately follow the call instruction, be at least one character long, and be followed by a zero byte and then the code that should be executed after it finishes. This function will fetch the return address, output bytes there until it sees a zero byte, update the return address to that of the zero byte, and return. Note that while although return address will be one the address of the trailing zero byte rather than the next "real" instruction, letting the zero byte execute as a NOP is faster than adding another INC HL between the LD A,(HL) and the EX (SP),HL instructions to prevent such execution.

answered Mar 24 '21 at 19:46

supercat

35,993
3
63
159

1
No need to do ex [sp],hl twice if HL is not to be saved: just pop hl and finally jp [hl] without ret. 2. Basically intermixing text messages with code is not very good style.

lvd

Mar 25 '21 at 08:16

@lvd: 1. Replacing the second EX (sp),HL with POP HL would make execution slightly faster if the client code doesn't care about HL, but using the EX means that there will be no need for client code to save HL if it does matter. 2. This approach saves three or five bytes at each call site, depending upon whether the client would need to maintain the HL value. While interleaving code and data is not without downsides, writing something like "RST PRIMM / DB "Hello there",0" is easier to write and read than "PUSH HL / LD HL,HelloThere / RST PRIMM / POP HL" and then... – supercat Mar 25 '21 at 15:07

...having to place the text of the message somewhere else. Even if an assembler allows interleaving content for code and data segments, doing so will make it necessary for the assembler and linker to do more work. Further, on some assemblers, if one switches to a data segment and wants to switch back to a code segment, one will have to know which code segment one was using and make certain one switches back to the right code segment. Using the PRIMM technique (which, credit be given, I first saw in the Commodore 128 Kernal even though it works better on the Z80) avoids such issues. – supercat Mar 25 '21 at 15:10

1

'easier to write and read' -- unless you have hundreds of such messages scattered along and mixed with the code, and then want to change some of them. Assembler considerations could be valid in 1980ies but not now. In most cases there is no even need in artificial segmenting the code and splitting compilation into assembling and linking, as the PCs now are blazingly fast and have TONS of memory. – lvd Mar 25 '21 at 17:03

@lvd: Are you suggesting that it's easier to have a section of the source code which holds all the messages than to place messages in source code near the places where they are used? That may facilitate some tasks like internationalization, but in general it's more convenient to have messages located in source near the point of use. On some platforms, segregating code and data may offer performance advantages, and it certainly makes disassemblers easier to use, but on the Z80, unless one is interested in having people disassemble one's code, this approach has mainly upsides. – supercat Mar 25 '21 at 17:37

@lvd: What practical advantage would ld hl,HelloThere / rst primm have to justify the increase in code size and possible need to save/restore HL? Even if one were using an assembler that could easily handle that pattern, it still adds three bytes to the code size, and would only offer advantages if a particular call to a print-message routine might sometimes need to output a constant string and sometimes output a dynamically-generated one; if the output routine were "fancy", one could write PRIMM to do the ex (sp), hl, call an HL routine, and then ex (sp),hl again and return. – supercat Mar 25 '21 at 17:45

grandpa wrote his Z80 code anyway he liked. The primm method was very common - I saw it before the advent of the C128. Probably even saw it with SC/MP. Would I do it now? Probably not, and I'd probably not be using a Z80 or even assembler. NIBL basic had the keywords spread throughout the code. It is interesting to see the diehard Z80 guys avoiding the use of IX/IY because we knew it was dog slow. I did know the # of cycles off the top of my head at one time and probably most of the opcodes. – Kartman Mar 26 '21 at 10:19

@Kartman: I do find it interesting that while the 8080 used an 8-bit ALU, the Z80 used a four-bit ALU, and yet almost all of the instructions whose performance is hurt by the four-bit ALU are new Z80-only instructions. If the Z80 had only included one index register, and made DD use it without displacement and FD use it with, it could have put the displacement byte before the main opcode, thus allowing four of the five address computation cycles to overlap the opcode fetch. The other missed opportunity, IMHO, would have been to use 64 of the ED-prefixed opcodes as 16-bit moves... – supercat Mar 26 '21 at 15:40

...which I think could have been handled in nine cycles. Not needed for moving among BC, DE, and HL, but they could have supported primed registers, SP, and HL in addition to the normal registers. – supercat Mar 26 '21 at 15:41

@supercat. I was surprised to find the z80 was a double pumped 4 bit alu. Goes to show what smart design can achieve. – Kartman Mar 27 '21 at 03:47

@Kartman: I was very surprised myself to learn it about a year or two ago, more than twenty years of having thought it had an 8-bit ALU. As for "smart design", the Z80 really feels like whoever designed the extended instructions was expecting an 8-bit ALU or an 8+16->16 ALU. The decision to have LDIR use BC instead of B, for example, might have made sense dec-and-check-for-zero with BC could be done as quickly with just B, but using the 16-bit register adds an extra 2 cycles. Though what I think would have been better yet would have been to use a 4-bit counter kept in the ALU... – supercat Mar 27 '21 at 16:23

...and specify that interrupts will be deferred for the execution time (up to 56 cycles for 16 bytes); if a delay that long would be intolerable, one would need to use a shorter repeat count. That would have avoided the need to use logic to adjust the program counter after each operation to facilitate the repeat, and the need to re-fetch the instruction after each byte. – supercat Mar 27 '21 at 16:25

Z80 string iteration failing

3 Answers3

Adressing Issue

Code Optimization