How much benefit should be expected on a more advanced compiler for z80/r800 based computers?

Question

I am self studying about compilers, and get hands on very good textbooks about the subject. I am thinking in develop a compiler using the almighty llvm infrastructure to cross compile to old computers, initially MSX ones. The thing is, i can't get an impression on how much benefit one could get on using this approach, over what is already on old compilers.

Obviously, the possibility to reuse code over different platforms is one, if a common compiler is to be developed, but the idea is getting the most of such processors, using modern optimization.

I understand that some obviously don't apply, like scheduling, but there is a lot of things that are done before code generation, and will be beneficial to every backend. The question i make is, how beneficial it will be to deliver such tool, compared to what is already on legacy compilers, and to modern ones too (sdcc and others). Also, how much this kind of effort could be beneficial to ease multi platform development (Uzix and the like) considering the severe restrictions on those platforms?

EDIT: I changed the subject of the question to reflect only MSX, to adjust the scope that was too broad.

This feels a little too broad, and likely off-topic. Targeting "old" processors is done by cross-compilers as a matter of course, and few of the folks doing that consider what they are doing "retro". — , Apr 22 '16 at 16:07
There are tools for developing software for old computers, and people developing soft for those computers. Doing new soft for old hardware is a common activity on retrocomputing. Obviously the compiler will not be retro, but will be used for retro. :) — flavio, Apr 22 '16 at 16:27
Coding is coding, and techniques are techniques and tools are tools. My main. Issue is that this question feels too broad. — , Apr 22 '16 at 17:05
I think it's on-topic, but too broad. You can't do branch-prediction optimization without a branch predictor, you can't do out-of-order scheduling on an in-order CPU, you can't do cache alignment without a cache, register allocation strategies are very different if you've only got one register, loop unrolling is bad if you've only got 2K of RAM, and so on. If you want to ask about how specific techniques apply to specific processors, go ahead, but "modern compiler techniques" and "old processors" are simply too broad. — Mark, Apr 22 '16 at 17:59
Ok, i understand the point. I will change "initially MSX", to focus on MSX, and make specific questions to other platforms when get to that point on development. — flavio, Apr 22 '16 at 18:16
I'm afraid the 'cross-platform' results will be very underwhelming. Only plaintext user interface would be portable to any reasonable degree. These computers very heavily depended on platform-specific dedicated hardware - which was quite incompatible between the platforms. Graphics, IO, sound, control - that would all require libraries that either trim the results to lowest common denominator (the worst of all worlds) or have platform-specific enhancements, which obviously break the compatibility. — SF., Apr 25 '16 at 12:44
hm, i think you are right on your statement, just making a little confusion. My question is about the compiler per se, and i am actually thinking in making a cross compiler - compile on (say) linux, use the result on MSX. I'm not thinking on make cross-platform apps. — flavio, Apr 26 '16 at 15:39
It's slightly tangential, but notice that one of the benefits of aore advanced compiler would simply be supporting a more advanced language. Check out https://m.youtube.com/watch?v=zBkNBP00wJE to see how improvements to metaprogramming in C++17 allow it to express an incredibly clean C64 Pong (simple title due to presentation length, not code overhead — full assembly output is shown as he goes). — Tommy, Oct 26 '17 at 11:30
@flavio ... have a look at jacobly0/llvm-z80 and watch issue/7 if you are planning ti build it. — , Jan 21 '18 at 19:33

score 24 · Accepted Answer · answered Apr 22 '16 at 22:03

24

Places where LLVM will provide no benefit, and may reduce performance:

The Z80 has no CPU cache, accessing memory directly instead. Any optimizations based around increasing cache efficiency (eg. aligning sequentially-accessed data to fit in a single cache line, or re-ordering instructions to group common execution paths together) will have, at best, no effect.
The Z80 is a strictly in-order CPU with just a single execution unit. Optimizations such as instruction re-ordering to permit out-of-order or parallel execution will have no effect.
The Z80 has a shallow-to-nonexistent pipeline and no speculative execution. Optimizations based around branch prediction or preventing pipeline stalls will have no effect.
The Z80 is often paired with a small amount of RAM. Many optimizations (such as loop unrolling or function inlining) make the program larger, and can't be used if the program would no longer fit in RAM.

Places where LLVM will shine:

Anything based on abstract analysis of the code, such as constant folding or common-subexpression elimination. Many of these techniques were developed long after the original Z80 compilers.
Register allocation: The Z80 has approximately 17 registers (depending on how you count), with complex rules about how you can use them. LLVM's design can draw on 30-40 years of research in this area that older compilers can't.

The improvement in abstract-analysis techniques in the past 40 years is huge. Even in the absence of hardware-specific optimizations, I'd expect LLVM-generated code to be much faster than that from older compilers, and competitive with all but the best hand-optimized assembly.

answered Apr 22 '16 at 22:03

Mark

8,556
1
40
63

do you think that intrinsics can be useful on the context of eliminating the inline assembly, or the optimizations will make a better job and render them useless? – flavio Apr 23 '16 at 14:44
That's something you'll have to figure out by benchmarking. – Mark Apr 23 '16 at 18:22
Those disadvantages are common with modern mcu parts. I'm not sure if LLVM targets those effectively yet, but presumably its only a matter of time - noone wants to maintain more architectures going forward than necessary. – Sean Houlihane Jun 02 '16 at 08:04
I think 'draw on research' here is the main fact. 30 years ago, I was able to write a (limited and simple, of course) compiler for that day's CPUs. Now I'm sometimes not even able to get the point of how the code was optimized, let alone write code better than the compiler does – Tommylee2k Oct 23 '17 at 11:48
2

Well, actually, you can benefit from branch predictions for Z80, because following or not following a conditional jp/call/ret takes a different amount of time. Using a faster option for a more common pathway would speed the program up. Of course, the difference is tiny and would only matter inside of the tightest of loops. – introspec Oct 26 '17 at 07:41
2

The massive increase in computing resources will also allow the compiler to try more advanced optimization than would have been possible in the past. Even the most powerful mainframes of the 70s-80s had limited computing power compared to a current laptop. – Michael Shopsin Oct 26 '17 at 16:05
"The Z80 is often paired with a small amount of RAM." 64KB ought to be enough for anyone! – RonJohn Sep 09 '19 at 20:23
Constant folding and CSE predated the original Z80 compilers. You just weren't likely to find them on a self-hosted Z80 compiler. Larger systems that could support the compilers (your DEC/DG/HP minis, your mainframes) all had that for a long time. – davidbak Sep 16 '23 at 16:58
@davidbak, I just gave those as examples because they're easy to understand. There are plenty of other abstract-analysis techniques that post-date the Z80. – Mark Sep 17 '23 at 04:23

score 11 · Answer 2 · answered Oct 22 '17 at 04:14

Z88DK is a suite of development tools for Z80 targets that includes a couple of different C compilers (one a variant of Small C and one a patched version of SDCC) and an assembly-level optimizer that is run as a post-filter on the output of these compilers. It also has a highly hand-optimized library.

They've published a set of benchmarks versus some older Z80 compilers (e.g. HISOFT-C and SDCC without their updates/optimizer) that depending on application show gains from 15% (Dhrystone) to 300% (Pi calculation). At the same time they show gains in code size varying from 7% to about 90% (I presume due to elimination of large chunks of duplicate code). Floating point performance is lower than some of the older compilers however - but that's because they're using higher-precision 48-bit floats rather than the 32-bit floats used by the older compilers.

It is able to target a variety of platforms including MSX, CP/M, and many of the early 80s home computers that used Z80 CPUs.

They are even using llvm! Very good to know. – flavio Dec 15 '17 at 01:12 — flavio, Dec 15 '17 at 01:12

score 7 · Answer 3 · answered Apr 22 '16 at 16:31

7

i can't get an impression on how much benefit one could get on using this approach, over what is already on old compilers.

If that were true, then old games would be developed using those old compilers, instead of hand-coded assembly. Old compilers were, mostly, very naive in terms of code optimization due to they being executing in the very same system they are producing code for, thus suffering from severe lack of precious RAM that would be needed to implement more sophisticated optimization techniques.

answered Apr 22 '16 at 16:31

mcleod_ideafix

18,784
2
70
102

Yes, good point. I'm trying to get some feedback on how much benefit will be achieved because there is some modern compilers that do some good work, but improving those (sdcc, even gcc have a z80 generator) apparently have less effort than expected. I just don't know if it is because is difficult to improve the backend for, say, gcc, or the benefit will be minimal and people just don't bother. – flavio Apr 22 '16 at 17:25

score 7 · Answer 4 · answered Apr 23 '16 at 09:22

7

regular user of the MSX-C compiler here. MSX-C is a rebranded version of the LSI-C 80 compiler bundled with an MSX-specific library. It comes with the same limitations and benefits:

PRO: When calling functions, MSX-C passes arguments on the CPU registers instead of the stack (as SDCC or Hitech-C do). This results in a big performance improvement in programs that do a lot of function calling.
CON: The compiler doesn't do any memory switching during the compile process. This means that you run out of memory quickly when developing complex programs. This forces the developer to split the program in smaller units and compile/link per parts.
CON: MSX-C only understands K&R C. SDCC and Hitech-C support more modern dialects of C.
PRO: MSX-C runs natively on the MSX. There's no need to waste time compiling under Linux/Windows and moving the binary to the MSX.

In my opinion, just writing a compiler for MSX able to use the memory mapper would be a huge improvement. Running out of memory is the biggest annoyance I've found so far.

answered Apr 23 '16 at 09:22

Javi Lavandeira

171
2

When you say "The compiler doesn't do any memory switching during the compile process" you are stating that the compiler doesn't use the memory for itself, or doesn't compile softwares that use more (if existent) memory? – flavio Apr 23 '16 at 16:46
Sorry for the late reply. Just saw your comment now.
I meant that the compiler itself doesn't use the memory. Even if your computer has 256KB or more the compiler only uses the 64KB mapped into the Z80 address space, often running out of memory.
– Javi Lavandeira May 04 '16 at 14:45
wow, thats sad :) The compiler itself doesn't comply with the MSX standard... Thanks for the reply. – flavio May 05 '16 at 15:26
Nobody said that the compiler doesn't comply with the MSX standard. I only said it doesn't use mapped memory. – Javi Lavandeira May 07 '16 at 05:37
mapped memory isn't a standard on msx? I need to read more :p – flavio May 07 '16 at 16:56
@flavio that's not the point. We're not talking about whether the mapper is standard or not. I just said the compiler doesn't use mapper memory even if it is available. – Javi Lavandeira May 09 '16 at 00:23
Ah, ok. I see my confusion now. – flavio May 09 '16 at 00:31

score 1 · Answer 5 · answered Oct 26 '17 at 08:13

1

I would reply in a more cagey fashion. It would certainly be useful to have a better C compiler for Z80 - the existing ones are not producing very good code. It is also clear that a number of people are experimenting, even with these not very good compilers, and end up creating new and useful software. Some of these software are definitely not bad. My real concerns are as follows:

It is clear that you ought to be able to write a better compiler than the existing ones. It is not clear if you can write a really good compiler - the one that would tempt asm coders to switch, because the gap between a good asm code and a good compiled code on Z80 platforms at the moment is massive.
Part of the problem is to do with the fact that C was not really designed with 8-bit platforms in mind. The vast majority of the code assumes at least 16-bit int; the local variables cannot be efficiently allocated on stack and addressed on Z80, which makes implementation of function parameters very painful, especially with recursion allowed. It can all be supported, of course, but at the cost of writing significantly less efficient code. To address this things properly you would need to make changes to the language itself, and that will put a lot of people off.
To make matters worse, a lot of paradigms for efficient Z80 coding are really not fitting into C. Things like movable stack pointers, self-modifying code, partial loop unrolling are just not catered for. Just like every Z80 coder, I would love to see a compiler with a bit of intuition for the use of Z80 registers, but I do not hold much hope.

answered Oct 26 '17 at 08:13

introspec

4,172
1
19
29

Compared with the 6502 for example, I thought the Z80 doesn't deal all that badly with C code. I'd be interested to see some concrete examples of the kinds C constructs you mean. – Omar and Lorraine Oct 26 '17 at 09:00
If one has an array less than 256 bytes, there are three ways it can be placed: 1. Aligned with a 256-byte page; 2. Aligned so that it doesn't cross a 256-byte page; 3. Arbitrarily. Approach #1 is much faster than #2, which is in turn much faster than #3. C offers no way to exploit such layouts. – supercat Oct 26 '17 at 14:52
2

The C language may not support these paradigms, but there's nothing preventing a C compiler from supporting them. Loop unrolling, partial or otherwise, has been supported by optimizing compilers for decades, and aligning data for efficiency has been around even longer. – Mark Oct 26 '17 at 19:21
@Mark: An "all-in-one builder" [compiler and linker combined] might be able to offer a special declaration syntax and place objects suitably for use with it, but I've not seen any Z80 linkers that could reliably handle automatic object placements subject to constraints. The requirements for special syntax and incompatibility with existing linkers would make the language rather different from 'normal' C. – supercat Oct 26 '17 at 22:30
@Mark: Incidentally, I've sometimes wondered why no Z80 compilers support the same automatic-variable paradigm used by compilers for the 8051 and PIC (have the linker statically overlay them based on a call graph). Simply declaring local variables "static" often improves performance, but wastes memory compared with having the linker overlay them. – supercat Oct 26 '17 at 22:34
Why discuss 6502 at all if we are talking about C on Z80?!
The fundamental issue that I mentioned explicitly is allocation of local variables. You kind of have to do it on the stack, thus you would have to do awkward manipulations with SP. Static allocation would break the recursion.
– introspec Oct 27 '17 at 10:29
The difficulties with 16 bit integers, which are not after all particularly friendly to an 8-bit processor is another issue I mentioned very explicitly. Of course one can deal with it. My point is, dealing with it costs you performance and makes your code necessarily inefficient. – introspec Oct 27 '17 at 10:36
@introspec no no, I am talking about Z80. I was only comparing with 6502 which is truly horrible to compile C to. So I wondered if you could give a Z80 specific example of something in C that doesn't translate well to Z80. – Omar and Lorraine Oct 27 '17 at 19:20
@Wilson, sorry, I did not see your comment. I was mostly going the other way. I know plenty of Z80 code that is so unlike C that there is no hope to even imitate it. I.e. my point is that good Z80 code is not very well matched by the paradigm offered by C. The fact that you can translate C into somewhat pedestrian Z80 code is not very relevant in this sense. – introspec Feb 15 '18 at 16:06
1

@Wilson: For code that doesn't need to use recursion or do anything too tricky with function pointers, there would be no particular difficulty making a C-to-6502 compiler that could generate decent code, and could overlay the automatic variables of functions that wouldn't be in scope simultaneously. Generating good code for the 6502 under such circumstances would likely be easier than doing so for the Z80. – supercat May 07 '18 at 20:11

How much benefit should be expected on a more advanced compiler for z80/r800 based computers?

5 Answers5

Linked