55

I found Why do C to Z80 compilers produce poor code? very interesting as it pointed out that C (which was leveraged to be an abstraction of a CPU for porting Unix) was not a very easy language to create efficient machine code from for the Z80. Apparently the same for 6502 where many dive directly in machine code. I read that the Sargon chess engine was very well suited for the 6502 due to the X and Y index registers.

I know that the Z80 and the 6502 are very different, but I was wondering if there are any languages on a higher level than assembly which can generate compact and efficient 8-bit machine code by design for either of them (or any other 8-bit CPU from that era), and how this was achieved?

Thorbjørn Ravn Andersen
  • 2,262
  • 1
  • 14
  • 25
  • 36
    Do you consider FORTH high level? It generates very compact code that I'm guessing will compete with or beat C (on these processors) for speed. – Wayne Conrad May 29 '20 at 14:26
  • @hippietrail - in my experience, my assembly program had little in common with a real expert - almost like they were programming in a different language! – Jon Custer May 29 '20 at 15:01
  • 2
    @jon yes. Hence the question. Assembly takes much longer to write – Thorbjørn Ravn Andersen May 29 '20 at 15:09
  • 18
    As I understand it, AVRGCC produces quite decent code. Before arguing that C produces inefficient code on 8 bit architectures, one should (a) clarify whether it's a problem of 8 bit archictectures in general, or specific 8 bit architectures like the Z80 or 6502; (b) clarify whether this is an issue of the language, or something compiler specific (modern cross compilers can throw a lot more resources at optimization, plus 20-odd years of compiler development); (c) check if you're forcing the compiler to be inefficient by, say, using int where a uint8_t would suffice. – Michael Graf May 29 '20 at 15:52
  • 11
    @MichaelGraf AVR is a modern CPU architecture that was designed to support C and other high-level languages. It's in an entirely different league than the "classic" 8-bit CPU architecture we talk about here. –  May 29 '20 at 20:47
  • 1
    @MichaelGraf Just think "Generate efficient code for a C64 or a CP/M-80 machine" – Thorbjørn Ravn Andersen May 29 '20 at 20:48
  • 1
    @MichaelGraf It's less about 6502 or Z80 as specific examples then 'simple' CPUs fitting the C model vs. complex CPUs offering a different world view. – Raffzahn May 29 '20 at 21:48
  • Should the language be a good fit for multiple CPU families, or is it ok for it to natively support only one? – snips-n-snails May 30 '20 at 05:28
  • @snips-n-snails If you have a good example, please share. – Thorbjørn Ravn Andersen May 30 '20 at 07:16
  • I am surprised that no one seems to have pointed out that it is not the programming language, but the compiler, that matters – Mawg says reinstate Monica May 31 '20 at 10:11
  • 2
    @mawg the linked question points out that some constructs in C are hard to create in Z80 machine code at all, even less efficient code. Naturally the compiler matters but it is about what the hardware offers. – Thorbjørn Ravn Andersen May 31 '20 at 10:30
  • What languages... the best is Macro Assembler like Merlin Pro. – i486 Jun 01 '20 at 21:05
  • 2
    The design of the 6809 made C compilation relatively convenient. Probably the best of any 8-bit platform of that era. – supercat Jun 03 '20 at 19:18

13 Answers13

58

One language that was popular on early 8-bit micros, including those that used the 6502 CPU, was Forth. Forth is exceptionally good for this use case, and superior to a C compiler, because Forth can make more efficient use of the 6502's hardware stack. Forth lacks any sophisticated methods of dealing with parameters. Everything is passed through the Forth stack and procedures just deal with the stack for both their input and output. This means the language doesn't require much from the CPU in terms of addressing modes or spend any time doing sophisticated effective address calculations.

Additionally, Forth provides a somewhat different paradigm than C in that it requires a program to be built up from very primitive and efficient units known as "Words" in Forth. By combining the primitive words into ever more complex combinations, the program is built up in a way similar to Functional Programming languages. This ensures Forth is very simple (and fast) to compile, even on 8-bit machines, and that the results execute very efficiently, given that the lowest level Words were coded to be efficient on the CPU.

According to some 6502 Forth users, the typical overhead incurred by Forth programs vs. similar functionality in Assembly is about 25%. And various Forth compilers for 6502 have been implemented in as little as 1.5 KiB. This fact makes Forth likely the only language compiler you will find running from an 8-bit computer ROM cartridge. So, it is both the low overhead of the compiler AND the efficiency of the resulting code that made it a favorite of early microcomputer programmers seeking something more "productive" than Assembly language.

Brian H
  • 60,767
  • 20
  • 200
  • 362
  • 3
    FORTH was available as compiler language? – Martin Rosenau May 29 '20 at 15:09
  • 4
    @MartinRosenau Yes. Generates machine code and no interpreter or runtime is required. – Brian H May 29 '20 at 15:13
  • 1
    Does this answer apply only to 6502 (and derivatives) -based machines or does it also apply to Z80-based machines? AFAIK lack of general purpose registers and difficulties with addressing would apply to both CPUs, but I'm still curious to know if there is any concrete data comparing FORTH overhead between 6502 and Z80, and if there are any languages better than FORTH for either of the CPUs. – moonwalker May 29 '20 at 15:58
  • 8
    @moonwalker Forth worked quite well on the Z80 too. There's the Jupiter Ace that had a bulit-in Forth instead of BASIC. – Omar and Lorraine May 29 '20 at 16:03
  • 1
    @moonwalker Well, there's always the Ultimate Benchmark, an ongoing effort to give a way to compare various classic computers using FORTH. Have fun. – Raffzahn May 29 '20 at 18:29
  • 1
    @MartinRosenau one beautiful thing about FORTH is that a child could write a reasonable compiler for it. First, you provide implementations of a few basic words (functions) in assembly or C or whatever, and then you compile everything else (the remainder of the standard library, and the user code) as CALL; CALL; CALL; ... ; RET sequences. It's called "threaded code" (not to be confused with other meanings of "threads") – hobbs May 30 '20 at 06:21
  • 1
    I think a lot of Forth compilers at the time actually generated threaded code which would not have been as efficient as a optimised compiler.The code actually run from a Forth compiler would probably be not as fast as the code from a c compiler. – JeremyP May 30 '20 at 11:12
  • 8
    I worked (professionally) with Forth in the mid-80s, and don't remember any compilers that compiled to machine code. I think it would actually be less efficient in terms of space to do this, as well as in terms of time if it was implemented as subroutine calls. The standard Forth implementation stored a word's definition as a series of addresses, and the execution engine would use a jump instruction to go to the next address. Not saying it didn't happen, but I don't see why anyone would. – kdgregory May 30 '20 at 11:54
  • 3
    @JeremyP: If a Z80 Forth system were to use the system stack and HL for the operand stack (top of stack is HL; everything else is on the stack), code to push a constant would be simply "PUSH HL / LOAD HL,xx". Four bytes, including the two-byte constant. "+" would be "POP DE / ADD HL,DE". Two bytes. Operations that need to be handled by called functions would be a bit trickier. If one wanted to do a subtract that way, it might need to "POP DE / POP BC / AND A / SBC HL,BC / PUSH DE / RET". Two extra instructions to manage the return address, but if most operations avoid fn calls... – supercat May 30 '20 at 18:51
  • 1
    ...that could still be a net win. – supercat May 30 '20 at 18:51
  • AmForth works directly on an Arduino (it requires an AVR programmer to get started, but that can be achieved with a second Arduino). I have a (turn-key'ed) system that actually runs in the real world 24/7 (garage state indicator). – Peter Mortensen May 30 '20 at 22:10
  • @kdgregory: Some of the commercial Forth systems advertise (cross) compile optimisations, e.g. SwiftX - "Optimizing Compiler" - "SwiftForth uses inlined code and tail recursion to increase efficiency. More extensive optimization is provided by a powerful rule-based optimizer that can optimize hundreds of common high-level phrases". – Peter Mortensen May 30 '20 at 22:14
  • @supercat But an optimising compiler wouldn't need to do the stack manipulations in a lot of cases. If you can evaluate an expression just using registers, it's much faster than using a stack model. This is why people don't deign processors based on the stack model anymore. – JeremyP May 31 '20 at 08:38
  • 3
    “Forth likely the only language compiler you will find running from an 8-bit computer ROM cartridge” — that's not quite true, I'm afraid.  The BBC Micro had ROMs for AMPLE, APL, BASIC (2 versions, excluding the built-in BBC BASIC), BCPL, C (several versions), COMAL, Forth, FORTRAN, Halcien, LISP, Logo, Pascal (at least 3 versions), and PROLOG.  (cont.)… – gidds May 31 '20 at 12:22
  • 1
    … Some of those were interpreted; and some (such as ISO-Pascal) were compiled to bytecode/intermediate code (which in some cases could be turned into a stand-along executable by including relevant parts of the interpreter). But some (including FORTH and at least two versions of C) were compiled to full stand-alone machine code. – gidds May 31 '20 at 12:23
  • 1
    @JeremyP: It's certainly true that peephole optimization could offer some major improvements relatively cheaply, but even the "mindless translator" approach would produce better code than one based upon calling routines for everything. As for using a stack model, it would be possible to process a stack model efficiently if an architecture required code to fit the usage pattern enforced by JVM bytecode validation, which requires that every time a particular instruction is executed, either the stack pointer must either have the same displacement relative to its value on function entry, or... – supercat May 31 '20 at 17:11
  • 1
    ...there must be no downstream execution path that would examine the present state of the stack. This would allow for a compact program representation using push/pop based instructions, but allow a processor to replace each push and pop with an access to a virtual register. – supercat May 31 '20 at 17:15
  • The comment about ROM cartridge seems odd. - BBC had ROMS and I know that Forth, Lisp, BCPL and Pascal at least were available – mmmmmm Apr 08 '21 at 09:28
  • 1
    Forth is a very specific language straddling the line between compilers and interpreters. One could call it a zero-pass compiler: it performs compilation of every command as you enter it, and if it's a function definition command, it will compile it and make the compiled version available for consecutive commands immediately be it for immediate execution or referencing in functions. – SF. Apr 09 '21 at 03:36
  • 2
    A good bit of Forth is written in Forth, only barest minimum in assembly, the rest bootstrapping itself using prior primitives. There's no firm distinction between the language's 'shell' and the user program, all functions living in the same dynamically built 'dictionary', you just extend the language until it does your specific task. As result, there are no distinct 'compiled binaries' produced by Forth, you just append your program to the end of Forth's own bootstrap obtaining a redistributable which "under the hood" is Forth customized to perform what you need. – SF. Apr 09 '21 at 03:47
27

C can be greatly improved as a language for the 6502 and Z80, as well as micros like the PIC and 8051, if one abandons the notion that implementations must provide for recursive subroutine calls, and adds qualifiers for things in zero page or pointers that are limited to accessing such things, and (for the Z80) adds qualifiers to identify objects that are known not to cross 256-byte boundaries.

Ironically, platforms like the PIC and 8051 which can't really support recursion at all and would thus seem unsuitable for C end up having better C compilers than those like the Z80 and 6502 which can barely support recursion, and thus generate code which is reentrant but inefficient instead of efficient non-reentrant code.

supercat
  • 35,993
  • 3
  • 63
  • 159
  • 13
    This is why Small C (and its many flavors) flourished in the 8-bit world. – Jim Nelson May 29 '20 at 20:05
  • What versions didn't attempt to support recursion? – supercat May 29 '20 at 20:07
  • I wasn't thinking of recursion, but other C features that were dropped in the name of space or specific machine limitations. – Jim Nelson May 29 '20 at 20:15
  • 1
    @JimNelson: Support for recursion is one of the biggest obstacles to efficient code generation. BTW, another useful feature on the 6502 I've not seen in compilers would be an ability to use a "register" qualifier on arrays that would only be accessed via the subscripting operator rather than member-type pointers. Such arrays, if 256 items or less, could be stored more efficiently with all low bytes together and all high bytes together, than as low-high pairs. – supercat May 29 '20 at 20:18
  • 1
    So you think of a subset of C? – Thorbjørn Ravn Andersen May 29 '20 at 20:21
  • @supercat I get it, and I thought your answer did a good job elucidating that. My point with Small C was simply that, at the time, quite a few people were of the mind to bend C to the machines' limitations rather than the other way around. That's why I discussed Action! in my answer, which disallowed recursion, took advantage of page zero, and more. – Jim Nelson May 29 '20 at 20:22
  • 1
    @ThorbjørnRavnAndersen: C's syntax isn't great, but I think it would be adaptable to the purpose of programming an 8-bit micro. – supercat May 29 '20 at 21:26
  • @ThorbjørnRavnAndersen You may be thinking of Ron Cain's Small-C (which produced pretty inefficient code for 8080, in the releases I looked at, and didn't take any advantage of Z80). https://en.wikipedia.org/wiki/Small-C – user_1818839 May 30 '20 at 13:43
  • @brian i am not thinking of anything. I ask supercat. – Thorbjørn Ravn Andersen May 30 '20 at 16:12
  • @ThorbjørnRavnAndersen: In the days prior to the Standard, people writing compilers for many platforms tended to tweak the language in whatever way would fit those platforms; I think the Standard would be much better if it there had been a consensus that it should describe how implementations should endeavor to work when practical, but made generous allowances for implementations to deviate from normal behaviors if they document that they do so, and use pre-defined "quirks" macros to indicate ways in which their features and behaviors differ from commonplace ones. – supercat May 30 '20 at 18:06
  • 3
    @supercat To my understanding the only language which really had a standard back then for microcomputers were Standard Pascal. I fully understand why e..g. Turbo Pascal chose to rework the string support. I am interested in any high level language which translated really well to machine code. – Thorbjørn Ravn Andersen May 30 '20 at 18:15
  • 1
    @ThorbjørnRavnAndersen: It's funny that the term "billion dollar mistake" is applied to null pointers, which are better than any universally-applicable alternative of comparable complexity could have been, when C strings strike me as a much bigger "unforced error". There is one situation in which C-style strings are good, which is sequentially processing the contents of constant strings. That's a common usage scenario, but unfortunately C strings are lousy for pretty much anything else. – supercat May 30 '20 at 18:28
  • 2
    @supercat C strings are as far as I know not a language feature but a library facility. This is not important for this particular question. – Thorbjørn Ravn Andersen May 30 '20 at 18:30
  • @ThorbjørnRavnAndersen: People complain about Pascal strings' 255 character limit, but storing strings requires that one either reserve space for a maximum-length string in whatever container one is holding them, or manage string-data lifetime separately from that of the container. Using fixed-allocation buffers for short strings is reasonable, but for larger strings it becomes increasingly untenable. If code assigns the output of a string function to a short string, a compiler can give the function a 256-byte buffer to accept the result, and then copy whatever portion fits to the target. – supercat May 30 '20 at 18:33
  • @ThorbjørnRavnAndersen: I mentioned strings because you mentioned Pascal's rework of them. Strings are unfortunately a language facility because the only way of allocating space for an initialized static const object within an expression is to use a string literal, which gets stored as a zero-terminated string. – supercat May 30 '20 at 18:34
  • I didn't use C on 8-bit machines, but I used PolyPascal (a sibling to Turbo Pascal) which supported recursion so it was possible to have such a language on 8-bit. – Thorbjørn Ravn Andersen Jun 01 '20 at 19:54
  • @ThorbjørnRavnAndersen: I think I used Willserv pascal for the Commodore 64, which also supported recursion. It's certainly possible to support recursion on the 6502, but what generally isn't possible is generating code that will support recursion but will run anywhere near as efficiently as would be possible if such support was not required. – supercat Jun 21 '20 at 21:59
  • 1
    You would think that modern C compilers could determine that much of the code does not need to be re-entrant and optimize it accordingly. For example GCC is pretty good at that. – user Apr 09 '21 at 09:27
  • @user: Most C implementations are designed to generate code that can be invoked from code written in other languages, and would have no way of knowing what functions might be invoked re-entrantly from outside code unless explicitly informed. – supercat Apr 09 '21 at 14:50
  • @supercat that's true, although you can indicate which functions are externally accessible with the static keyword. – user Apr 13 '21 at 14:48
  • @user: One can indicate that functions are not externally accessible using static, but there is no mechanism for distinguishing functions that are externally callable only via code processed by the C implementation, versus those that may be externally invoked via means the C implementation knows nothing about. – supercat Apr 13 '21 at 14:50
  • @supercat the compiler can freely inline static functions so there is no need to distinguish if they are used by non-C code, the spec allows the compiler to simply not emit callable assembly code for them. GCC does that by default at -O2, -O3 and -Os. – user Apr 13 '21 at 15:07
  • @user: Static functions are clearly not used by non-C code, but they are also not callable from other compilation units processed by the same implementation. The issue is distinguishing functions which would need to be callable exclusively from external compilation units that the C compiler knows about, versus those which would need to be callable even from units the compiler knows nothing about. – supercat Apr 13 '21 at 17:21
  • @ThorbjørnRavnAndersen in a way, supersets even - eg SDCC when used with an 8051 platform introduced extra, platform specific keywords to deal with 8051 specific storage classes. Also, "volatile" and "register" AREN'T just cargo cult keywords to make your code look more cool on that platform :) – rackandboneman Aug 30 '22 at 21:04
  • @rackandboneman: Even on gcc, when configured for non-buggy mode (i.e. -O0), the register keyword can greatly improve code generation. In some (admittedly rare) cases, the register keyword will allow gcc to process loops more efficiently at -O0 than it would process them at higher optimization settings (e.g. if one declares register unsigned x12345678 = 0x12345678; and then uses x12345678 within a loop, gcc -O0 may keep that constant in a register while higher optimization settings would reload the constant 0x12345678 on every loop iteration. – supercat Aug 30 '22 at 21:20
24

I know that the Z80 and the 6502 are very different, but I was wondering if there are any languages on a higher level than assembly which can generate compact and efficient 8-bit machine code by design, and how this was achieved?

Well, a prime candidate would be Ada.

It was a specific design goal for Ada to produce good code for tiny and 'odd' microprocessors (*1). Two basic approaches enabled this:

  • the language itself was as non-assuming as possible, while at the same time
  • offering tools to specify certain workings as detailed as possible - where needed -
  • separating this to a great degree from generic code.

The high abstraction separates it from 'lower' languages like C or FORTH which are both built around certain assumptions about how a processor works and what functions it offers. In fact, C and Forth are great examples of two major pitfalls:

  • Expecting a certain low-level behaviour of a CPU and
  • ignoring high-level functions offered by a CPU

C for example is built on pointers and the assumption that everything has an address and is a series of bytes which can be iterated over (and may be structured further, but that can be ignored at will). CPUs with multiple address spaces or object storage or different understanding of data handling will inherently end up with less than desirable code.

The /370 is a great example here. While (register-based) pointers are an essential feature, the memory pointed to is handled as a block (or structure) with sub-blocks (fields) that can be manipulated with single instructions, not loops (*2). C-code forcing iteration onto a /370 can easy degrade (local) performance by a factor of 100 and more (*3).

Forth on the other hand is at its core built around the idea of a stack (or multiple stacks) and the ability for threaded code. Effective (stack) pointer handling and fast (and simple) moves to and from the stack are essential for performance. Both issues that 8-bit CPUs aren't inherently good at. The 6502 may have 128 pointers, but handling them is ugly. Indirect jumps, such as those needed for threaded code, are non-existent. Thus, fast implementations rely on self-modifying code. Then again, it is only marginally better on an 8080/Z80 as they have only one memory pointer (HL) and a backup (DE) which in turn is needed almost all the time.

Like C, Forth ignores higher-level function offerings, or has a hard time using them. Unlike C, it's a bit more open to changes in low-level behaviour.

Both languages are maybe higher than assemblers can operate on a more abstract level - if used carefully - but are not inherently abstract. They assume certain workings. If these are not basic machine instructions, performance will suffer.

A 'real' high-level language should not make such assumptions. Here, Pascal is a better candidate, as it assumes next to nothing. As a result, there are compilers for either line, 6502 and 8080/Z80, producing quite good code. I guess Turbo-Pascal for CP/M doesn't need any further introduction. On the 6502 side (Apple, Atari, Commodore) Kyan Pascal was considered a great way to work in high-level languages (*4).

Which brings us back to the original question, how to achieve good code performance on a wide range of machines:

  • Don't expose any low-level working to the programmer.
  • Have the compiler cover it.
  • Have the programmer define the intended result, not the way it is achieved.

Essentially the goals set for Ada :)


P.S.:

... on a higher level than assembly ...

Serious? That statement feels quite offending :)

Assembly can and often is already on a higher level than some other languages. Assembly is the essential prototype of an extensible language. Everything can be done and nothing is impossible.


*1 - Note the 'produce' clause, having the compiler run on such machines is a different story.

*2 - It's always helpful to keep in mind that the /370 may have spearheaded many modern concepts, but it was designed with a punch card in mind. A punch card is a record, maybe pointed to by a register, holding information (fields) at fixed offset with fixed length. The whole instruction set for character (byte) manipulation is built to fit. No need to loop over two fields to move, compare, translate, pack or even search within, the most basic instructions (MVC, CLC, TR, PACK, TRT) already take care to handle fields at once.

*3 - This was a huge problem when C first became requested by users and implemented. By now compilers have evolved, and more importantly, CPU designers have added quite some 'changes' to cover up for the inefficiency of C.

*4 - Its only fault was its late 'birth' - too late to make a major impact.

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 1
    I agree, Pascal would be the obvious choice, but Ada is a good fit. – Mark Williams May 29 '20 at 19:13
  • 2
    Thank you for being thorough. I'm sorry if you are offended by asking about a higher level than assembly - wikipedia says "In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. ". If your assembly language code abstracts strongly from the details of the computer, is it still assembly or some higher level language implemented in assembly macros? – Thorbjørn Ravn Andersen May 29 '20 at 20:17
  • @ThorbjørnRavnAndersen No harm done :) Good question, say, is C with a all the macros preprocessing and libraries used today on still C? – Raffzahn May 29 '20 at 21:44
  • 1
    Ada works as long as you relinquish the idea of being able to compile a program on the target architecture. My University created an Ada compiler which was by no means fully featured but required an IBM PC with 4Mb of extra RAM. – JeremyP May 30 '20 at 11:17
  • 1
    By the way, the 6502 does have an indirect JMP but Forth requires two stacks. Implementing the data stack even using a zero page pointer is going to result in pretty slow code. – JeremyP May 30 '20 at 11:23
  • @JeremyP Does it? From a changing location? Otherwise it's of not much help for threaded code. Puting resource requirement on the language seams off. I can't speculate about the project at your university, but it may be safe, that writing an optimized version and being capable to run with limited resources wasn't part of the requirement list. There are Ada subset compilers implemented to run on a 64 KiB Apple. – Raffzahn May 30 '20 at 12:24
  • 2
    A major issue with Ada is that the runtime library for the full language is very heavy. Ada 83 supports concurrency in a very first-class way, as well as exceptions, tagged unions, dynamically sized arrays of anything (including tasks, and as members of tagged unions), and other cool stuff. All that stuff fit onto an x86 (I was part of the Alsys team that created the first validated Ada 83 compiler for x86, hosted and targeted for the IBM PC running DOS) - but it would be a tight squeeze on an 8 bit CPU and would require whole program compilation (to leave out features you didn't need). – davidbak May 30 '20 at 16:13
  • 1
    @davidbak yes. That's true. A full standard compliant compiler on an 8 bit machine would be a marvel - even more so with a full library. Then again, noone says the library has to be monolithic. And yes, the best approach to Ada on a small system is always a full compile. This was and is still true - or invest quite some brain power on linking. I'm not saying that Ada is the best solution for everything (beside, it still is), but that any language that focuses on (true) abstract description of what to do will avoide ,any pitfalls and may be compiled to many quite different CPU. – Raffzahn May 30 '20 at 17:24
  • Agree with what you say ... except: Obv the library must not be monolithic but in addition codegen changes with concurrency: e.g. you have a record containing a field that is private (or maybe limited private, I forget) and that private field's type - different comp unit - may or may not contain a task ... There are semantics associated with the destruction of the record that are different if there is a task in it - which you don't know when you're at the code which destructs the record ... so you have to compile as if a task might be there ... whole program makes sense for 8-bit though! – davidbak May 30 '20 at 17:37
  • @davidbak All with you on that. At the same time, those constructs may not be needed in the first place when writing an application for a single tasking single user system. I guess we both see that these arguments are brought on a very high and quite luxurious level, isn't it? Even by striping Ada of many basic features (like concurrency) the remaining language would still be way more portable and fine tuned at the same time. Heck, it was Ada (and a quite sensible pile of money) that one time made me touch an 8051 with anything but a torch. – Raffzahn May 30 '20 at 17:55
  • 2
    There are indeed a lot of features in Ada that work nice on small systems, embedded systems - rich primitive types (including ranges of integers) among them. Someday I'd like to hear more of your 8051+Ada experience! – davidbak May 30 '20 at 17:58
  • 2
    @davidbak And that's why, to me, it is close to a perfect language for 8 bit systems. The application wasn't anything remarkable. Just a controller for an operator panel with two membrane keyboards, several seven-segment displays, a bunch of LED and two serial. Soft and hardware was rather simple. The nice part was an ability to constant self-testing of all items with reporting of any (upcoming) failure as well as denying usage in case of issues detected. It was part of a medical device, so higher standards. That's also the reason why the dreaded 8051 was used. It was considered proven working – Raffzahn May 30 '20 at 18:16
  • 1
    @davidbak Ada permits pragmas restricting use of the runtime to allow smaller runtimes (e.g. Ravenscar with a restricted tasking model, or even a ZFP (Zero Footprint) runtime. Something like "blinky" on AVR-Ada is a couple hundred bytes, and most of that is because the linker plants the C runtime in there (which isn't all used by Ada code) and my digital watch on MSP430 is under a kilobyte. – user_1818839 May 31 '20 at 14:22
  • @BrianDrummond - I'll look into that, thanks. – davidbak May 31 '20 at 14:43
  • 4
    One facet of C's byte fixation is that it doesn't allow bit addressing, while many MCUs and even some CPUs (Z80) offer ways to address individual bits (bsf, bcf, btst instructions). This means that an Ada programmer can declare an array of booleans or a Pascal programmer can use a set, while the C programmer has to use ANDs, ORs, bit masks and their complements, (easy to get wrong!) and the compiler has to reverse engineer the intent into a bit instruction. – user_1818839 Apr 08 '21 at 14:54
  • @BrianDrummond: The 6502 also has the ability to quickly test bit 7 of any byte, or bits 6 and 7 of any directly-addressed byte, but tests for bit 6 are hardly idiomatic in C. – supercat Apr 08 '21 at 18:14
  • Was there really any implementation of a Ada compiler/crosscompiler/interpreter for 6502? I found some Ada compilers for CP/M machines but no luck for 6502. I'm not sure if Abacus Ada on C64 exists or how much it implements Ada. – Schezuk Feb 01 '23 at 14:46
18

"Forth" was the first name that jumped to my mind. Another is Action!, an Atari 8-bit-specific language. (Its manual can be found on the Internet Archive.)

Action! is a structured Algol-inspired language that borrowed constructs from other languages (in particular, it offered C-like pointers and arrays) and native types that mapped cleanly to the 6502's memory model. Techniques that usually required assembly were possible, such as positioning code/data and trapping interrupts. Action! kind of stood between a full-featured macro assembler and a high-level language like Pascal. (That it didn't have native floating-point support or recursion is a hint of how pared-down it really was. This page has a nice summary of its limitations.)

I couldn't find hard numbers showing it was more efficient or faster than the Atari C compilers of the time, but this article from Hi-Res magazine shows Action! finishing a Sieve benchmark in the approximate time of a Z-80 C program.

Action! also offered a surprisingly full-featured IDE before the acronym was coined: Integrated full-screen text editor, in-memory compilation (which made it quite fast), and a monitor for debugging. Action! showed me how good tools make a big difference in the edit-compile-debug loop.

Jim Nelson
  • 3,783
  • 1
  • 18
  • 34
  • 2
    Action! was great. I was in high school when it came out, and it was a pleasure to use, being cartridge based and it had a great editor and fast compiler. – mannaggia May 08 '23 at 14:27
13

Ada for cross compilation; though there WERE native Ada compilers (e.g. Janus Ada, with a Z80 (Ada-83) release here and reviewed in 1982 here) it was stretching the capabilities of a 64kbyte machine. Side note : the response to review was by Randy Brukardt; in 2020 he is still selling Janus Ada and actively contributing to the comp.lang.ada newsgroup!
So, Gnat (utilising gcc and soon LLVM) can run on any decent host, and optimise pretty well for small targets - nowadays, AVR or MSP430. Ada is in some ways easier to optimise than C.

But one other candidate worth mentioning for native compilation would be Modula-2. A much smaller simpler (and yes, more restricted) language, rather in the Pascal mode, but much more amenable to compile on a decent Z80 system. I had the FTL Modula-2 compiler running on a Z80 CP/M system.

I don't remember specific benchmarks on Z80, but on slightly larger 8086/8088 systems (where "small model" executables were 64K) the JPI Topspeed Modula-2 compiler of the 1980s was probably the most efficient compiler for 8086 of any language in the DOS era.

user_1818839
  • 640
  • 4
  • 9
13

The main problem for high-level-languages on these platforms, and especially the 6502, is the small hardware stack. 256 bytes does not give one much room to work with for languages that intend to push large activation records on the stack.

As others have noted above, the solution is to remove recursion from your language definition, and in a more general sense, any "local" information.

Also worth mentioning, in the 1970s and early 80s when these machines were the bomb, the language all the cool people were working with were the many variations of ALGOL. Most mainframe systems had a "systems programming language" based to some degree on ALGOL layout, and Pascal once that became, effectively, the "new ALGOL". C did not become the universal solvent until the 16/32 bit machines had been in the market for some time.

So for instance, on the Atari you had Action!, an ALGOL-derived language with no recursion. This not only reduced the size of the stack use, but also greatly reduced the complexity of a proc call, you basically just did the branch. This later bit remains a topic of discussion to this day, as in Swift where Apple tries to convince you to use struct instead of class to reduce call overhead.

Raff mentioned Forth, which was designed as a multi-platform language that used its own stack structure to provide C-like capabilities on machines that lacked the requisite hardware support. While I guess it was a success in that respect, I recall trying to program in it and having feelings much the same as drinking way too much cheap gin.

Maury Markowitz
  • 19,803
  • 1
  • 47
  • 138
  • The hardware stack is not a limiting factor. Any modern C compiler is very good at keeping active variables in registers; and anyway, variables that cannot be stored in zero page, such as a C stack, will likely be stored in 16 bit memory. – johnwbyrd Apr 09 '21 at 07:04
  • 2
    Modern C compiler in 1980 on an 8-bit machine? Registers? We're talking about an 8-bit 6502 with a single ACC and an 8-bit SP. – Maury Markowitz Jul 27 '22 at 16:01
11

Despite the other answers posted here, Forth generally performs significantly worse on the 6502 than an optimizing C cross-compiler like CC65. In tests I did comparing it to Tali Forth 2 for the 65C02 [1], which generates the fastest type of Forth code called STC, Forth code is sometimes on par with the C equivalent but more often 5-10 times slower. As far as I can tell, these are the main reasons:

  1. All values pushed on the stack in Forth become 16 bit, which takes the 6502 a lot longer to manipulate than 8-bit values. C, on the other hand, has 8-bit types which are much faster to work with.

  2. Forth words constantly adjust the data stack as they push and pop things, while C functions tend to do most of the stack allocation at the beginning and end of a function, which is much more efficient.

  3. 6502 Forths don't generally do any optimization, even when enough information exists at compile time to do so. Something like "drop 5" in Forth will increase the stack pointer to do the drop then immediately decrease it to push the 5, so you get the useless series INX / INX / DEX / DEX. CC65 optimizes this type of inefficiency out in some but not all cases.

  4. 6502 Forths also don't optimize for constants. CC65 outputs more efficient assembly for something like "foo<<3;" than "foo<<bar;" since the number of shifts is known at compile time. Forth generates the same code in both cases, always using the most compatible but slowest version.

  5. Constraining the programmer to only modifying the top levels of the stack produces less efficient code. For example, you can't step over the first item on the stack and add something to the second. The equivalent "swap 5 + swap" wastes time on the two swap operations to get the value to the top of the stack and back into second place, while C can just directly modify any item on the stack.

CC65 is not perfect, but you're unlikely to get anything near as fast as that without writing the assembly yourself.

[1] http://calc6502.com/RobotGame/summary.html

Joey Shepard
  • 111
  • 3
  • 1
    Do you have a link to the programs you compared? – UncleBod Jun 20 '20 at 17:20
  • @UncleBod, I've been working on a web page to show the tests and my data. I'll try to post it here in the next day or two. – Joey Shepard Jun 20 '20 at 17:22
  • 1
    @UncleBod, link to the test I did: http://calc6502.com/RobotGame/summary.html – Joey Shepard Jun 30 '20 at 12:55
  • If one didn't need to go beyond 256 words of stack, I would think a 6502 forth-like language could be reasonably efficient if it e.g. stored the low bytes of all 256 stack entries at $9E00-$9EFF and the high bytes at $9F00-0x9FFF. Something like "add" could be "lda $9E00,x / clc / adc $9E01,x / sta $9E01,x / lda $9F00,x / adc $9F01,x / sta $9F01,x / inx". – supercat Apr 08 '21 at 15:49
  • @supercat, you can get twice as much stack space like that, but each of the 6 memory accesses you have for "add" will be 1 cycle slower than putting a smaller stack in zero page. Also, you can't use the values on your stack as pointers since they have to be in zero page and have to be stored contiguously. Storing them in higher memory also doesn't solve any of the speed problems above. On the other hand, 128 or less words of stack in zero page is way more than you'll probably ever need in Forth. – Joey Shepard Apr 09 '21 at 18:04
  • @JoeyShepard: Even if one keeps the stack in zero-page, storing the high and low bytes separately would eliminate an inx or dex every time it's necessary to move the pointer. If the 6502 had made (zp,x) addressing capture but mask off the lower bit of the operand byte when doing address calculation, and then forced the LSB of the target address high if the lower bit of the operand byte was high, that would have made (zp,x) addressing useful for accessing 16-bit values whose address was selected using the X register, but accessing the upper byte of a two-byte value effectively requires (zp),y. – supercat Apr 09 '21 at 19:07
8

For 8080, 8085 and Z80, possibly PL/M. That generated exactly what you told it. Also, it had special I/O instructions. With most other compilers, you had to call

output(0x20, 0x90)

but in PL/M it was built in

output(0x20) = 0x90

would generate the out instruction. There was a similar input instruction. The part of PL/M that always caught C programmers out was that even numbers were false and odd numbers were true. This gave rise to PL/M86 and PL/M286.

The use of Forth varies

  1. as a compiled language
  2. as a concept with a generic interpreter
  3. as a concept using indirect threaded code (https://en.wikipedia.org/wiki/Threaded_code#Indirect_threading) with a home-brew interpreter
  4. as a concept using knotted code (also known as token threads) with a home-brew interpreter.

I've seen 3 and 4 but not 1 or 2. Option 3 and 4 is normally used to reduce the code size but the program runs more slowly than if it were written in straight code. In the late 70s and early 80s, when information was obtained from journals, it wasn't easy to find a Forth compiler so most of the time it was a home brew version and everything was written in assembler.

Edit Token threads: Token Threads

I've looked through my old notes on Forth but I can't find the reference that describes the use of token threads in Forth so I'll describe it here. This is all about space saving on space limited machines.

Direct threading: In normal assembler, we have something like

main:  call func1
       call func2
       call func1
       call func3
       call func1
       call func2
       call func3

8/16-bits: 21 bytes 32-bits: 35 bytes

Just adding 32-bit for completeness. This does not apply to Z80s.

Indirect Threading: For a 16 bit machine, there are 3 bytes for every call. For a 32-bit machine, there are 5 bytes for every call. In most Forth implementations, there is a small piece of code (the exec) that reads the address and executes, a bit like an indirect function call in C/Fortran. This saves 1 byte per call, which is significant if the saving exceeds the size of the exec code

main: word func1
      word func2
      word func1
      word func3
      word func1
      word func2
      word func3

8/16-bits: 14 bytes 32-bits: 28 bytes

Token threading: If there is a large number of these, and there aren't more than 256 functions, it is possible to save more space by using a lookup table

lut:  word func1
      word func2
      word func3
main: byte 1
      byte 2
      byte 1
      byte 3
      byte 1
      byte 2
      byte 3

8/16-bits: 13 bytes 32-bits: 19 bytes

This is a further "saving" of 1 byte for every word in 16-bits and 3 bytes for every word in 32-bits: that is provided the "saving" exceeds the size of the exec.

One byte may not seem significant but in a tightly packed PROM, sometimes it makes the difference between having an 8K ROM or redesigning to accommodate a 16K ROM.

cup
  • 2,525
  • 1
  • 9
  • 21
  • Do you have a pointer to information about "knotted code" or "token threads"? web search (for me) turns up only this answer for either of those terms ... – davidbak Aug 22 '22 at 20:26
  • 1
    I'll have a look through my photocopies of my old Forth docs see if I can find the actual reference. It will take a while - it is up in the attic and hasn't seen daylight for about 30 years – cup Aug 23 '22 at 07:06
  • Notably, PL/M functions are not required to be re-entrant so their local variables get stored in memory rather than on the stack. However, there is still room for improvement; at least with the compiler used to build CP/M, the code generated seems to include a number of unnecessary loads of values that are already in registers. – john_e Aug 28 '22 at 09:35
  • Very often, the people who do the code generation, don't really know the underlying assembler. The compiler emits an intermediate language which is interpreted into the underlying language. This can lead to unnecessary loads. Many of these "home brew" intermediate languages (at least, the ones I have used) don't have the concept of registers. – cup Aug 28 '22 at 10:00
  • "That generated exactly what you told it". Code is rarely written to generate tight assembly code, so this might be sub-optimal. Do you know if any optimizations were in place for 8-bit machines? – Thorbjørn Ravn Andersen Aug 28 '22 at 14:12
  • 1
    For PLM, the optimizations are listed in chapter 12 of the manual http://bitsavers.trailing-edge.com/pdf/intel/PLM/9800268B_PLM-80_Programming_Manual_Jan80.pdf – cup Aug 29 '22 at 07:08
6

I suggest PLASMA (https://github.com/dschmenk/PLASMA), a C-like language that compiles to interpreted code. It has a much higher code-density than assembly language, and it's much faster than FORTH.

peter ferrie
  • 1,314
  • 2
  • 11
  • 25
  • 1
    That's the third PLASMA programming language I heard of. The most well-known is probably Carl Hewitt's PLASMA language he designed to illustrate his newly-invented Actor Model of computation, and a member of the MIT AI family of languages (LISP, PLANNER, Microplanner, PLASMA, Scheme). The second one I forgot. And just now I found this: https://plasmalang.org/ . – Jörg W Mittag Jun 01 '20 at 16:20
  • @wilson It could compete favourably with even compiled Forth, based on the examples of compiled Forth that I have seen, but maybe I've been looking at the results of a poor compiler. – peter ferrie Jun 05 '20 at 22:51
6

It only has to do with the effort put into the code generator back-end. C is an abstract language, it doesn't need to directly reflect what the machine is doing. But this is the sort of stuff that would be state-of-the-art in 2020, and would require significant investment. There's nothing inherently special about Z80 of 6502 in this respect - only that the impedance mismatch between some platforms and the code generator back-ends is very high. For Z80 and 6502 it wouldn't matter what the language is, because the specifics of the language are far away and dissolved by the time the intermediate representation gets to the optimizer and code generator. Any high-level compiled language would be just as bad on Z80 and 6502 as C is, pretty much.

We're spoiled with excellent modern compiler back-ends. The trouble is that they are commonplace that everyone thinks it's "easy" work. Not at all. They represent man-decades of effort if someone were just to reproduce them.

So, you can get excellent Z80 and 6502 code out of a C compiler if you hire a couple LLVM back-end experts out from Apple and Google, pay them the going rate, and let them at it for a couple of years. A couple million dollars is all it'd take, and you'd grace the world with absolutely amazing Z80 and 6502 code produced from both C and C++.

So: I'm sure the results would be excellent- but it requires lots of effort. It's the sort of effort that historically has not been expended by even major silicon vendors, with exception of Intel, Digital and IBM. Zilog's own compilers (all of them, doesn't matter what year was the release) are junk when you compare what they manage to cough up to x86 or ARM output from C code passed through Clang and LLVM, and all the man effort put up by, say, Zilog and Motorola compiler teams throughout the 70s, 80s and 90s, all together in total, was completely eclipsed by the man-hours that went into, say, Clang+LLVM in the first decade of the existence of both projects. Zilog's and Motorola's marketshare back when they still had plenty of it absolutely didn't improve matters here: they were a bit too early and the everyday techniques used by e.g. LLVM weren't available and/or they required so much memory and CPU cycles to run that it wasn't feasible to offer such products to wider audience, because you pretty much needed a heavy minicomputer or a top-notch workstation to do this sort of work.

  • 2
    Do you have actual experience with 8-bit cpus? – Thorbjørn Ravn Andersen Jun 28 '20 at 10:35
  • 2
    Semi-regrettably so. I'd say half of the code I wrote over the last 3 decades of me writing code was for 8-bit CPUs. When I bash Zilog and Motorola compilers, it's based on experience all too intimate :/ But in short: If humans can write acceptable Z80 assembly, so can a C compiler that got sufficient amount of money and talent dumped into it, under leadership of a suitable luminary. That's "all" there's to it, at the end of the day: how many $$$ were spent (or spent not) on that problem - and were they spent productively. – Kuba hasn't forgotten Monica Jun 29 '20 at 02:14
  • 2
    Getting good C or Pascal-language performance requires foregoing recursion in all cases where it isn't needed, and using a linker that can statically overlay automatic objects ("local variables") used in different functions. I don't think LLVM is really set up to accommodate such things. Further, modern compiler back-end optimizers seem to have baked-in assumptions about the relative costs of various actions, and generate sub-optimal code for platforms where the actual relative costs of actions don't fit those assumptions. – supercat Jun 29 '20 at 16:46
  • @ReinstateMonica: An annoyance I have with the design of C is the lack of byte-based pointer arithmetic operators. When using arrays of 16-bit values in machine code, especially on the 16-bit 8088/8086, or even more so on the 68000, it often makes sense to have indices step by two to exploit the [bx+si] or [si+const] or @(Ai+Dj.w) addressing modes, but there's no nice way to express such concepts in C. – supercat Jun 29 '20 at 17:21
  • 3
    "We're spoiled with excellent modern compiler back-ends. The trouble is that they are commonplace that everyone thinks it's "easy" work." This seems to be almost impossible to explain to people who keep saying that this or that optimization just should be there, no big deal. I really appreciate you saying this so explicitly. – introspec Jun 29 '20 at 20:24
  • @supercat LLVM can do it, but not natively. That is not a big deal - you basically convert all non-recursive locals that aren't in registers to statics, then run a script on all IL modules to extract the call tree and those statics, then you generate a link map with all the statics properly overlaid in their own section, and presto. Not out of the box but <500 extra lines of scripting needed. I'm using that in production and it seems to work so far. – Kuba hasn't forgotten Monica Jun 29 '20 at 22:43
  • @ReinstateMonica: Even if LLVM could manage static overlays with the assistance of outside tools, I would still be skeptical of its optimizer's usefulness for most practical purposes involving small CPUs. From what I can tell,optimization efforts are aimed much more at finding clever ways of optimizing out constructs that programmer could have omitted if they didn't want them, than in exploiting simple peephole improvements. Clang might be less absurd than gcc, but neither impresses me when targeting the Cortex-M0, which should be an easier target than the 6502 or Z80. – supercat Jun 29 '20 at 22:59
  • Peephole optimizations are a dead-end, a scourge of compiler textbooks. You can't implement them properly - they pretty much work by happy coincidences. When doing peephole-style assembly rewriting, you need full semantics of the instructions, and you must operate on higher-level data structures than merely by matching patterns on target instructions. I've been there, and if you give me a compiler with peephole opts, I can always add one innocuous looking peephole optimization that will be beneficial 99.999% of time, but produce wrong output sometimes. Semantic instruction selection is it. – Kuba hasn't forgotten Monica Sep 16 '20 at 23:33
  • @Kubahasn'tforgottenMonica: By "peephole improvements", I was referring to the concept of optimizations which are performed without having to do a broad complicated analysis of the program as a whole. One may need to scan for which parts of registers are potentially live before an optimization stage, but things like "replace a sign extension instruction with nothing if the upper bits of the register won't be live at the following instruction" are both simple and safe. – supercat Jan 19 '21 at 18:22
4

I know that the Z80 and the 6502 are very different, but I was wondering if there are >any languages on a higher level than assembly which can generate compact and efficient >8-bit machine code by design for either of them (or any other 8-bit CPU from that era), >and how this was achieved?

I've been working on my own high-level language "Higgs" which targets 6502,65C02,68000,68040, RISC DSP and recently started working on Z80 backend.

The output (build script called from within Notepad++) is an assembler file that is then fed into the local assembler/linker of the respective platform.

The feature list of the language depends directly on the target platform's abilities. Each HW target has different set of unique features, dictated by the addressing modes / asm capabilities of the platform. Arrays on 6502 are very different than arrays on 68000 or DSP RISC.

Each target however supports global/local/register variables, global/local constants, structures, arrays, functions (with optional parameters), loops, conditions, nested blocks (helps with formatting and namespace pollution), 3-parameter math expressions, signed math (if present), increment/decrement (var++, var--).

My basic rule is that I never include a new feature unless I can guarantee that the code generated by my compiler is identical to the code I would write manually, directly in ASM.

From experience of writing my own game in it (~25,000 lines of Higgs so far), it's exponentially faster to write/debug/test new code compared to ASM. Less than 0.01% of code is still written in ASM, the rest is Higgs.

I will be adding Z80/Next backend soon.

If you could only have 3 features that would increase your productivity, this is what gives you most return:

  1. conditions
  2. math expressions
  3. scope-based variables/constants {}

Here's an example (68000 target: hence d0-d7/a0-a7 registers, .b, .w, .l sizing, etc.), showing how high-level it is (compared to ASM) and that it really feels almost like C, and is thus very easy to come back to, after 6 months, and quickly understand and adjust the code (unlike hand-written ASM that mostly evokes deep WTF feelings):

Render_LaserShots:
{
    local long lpMain
{   ; Player LS
    colorQuad = #$FFA080
    SLaserShot.InitRegister (LaserShots)
    loop (lpMain = #MaxLaserShots)
    {
        if.l (SLaserShot.IsActive == #1)
        {
            d1 = #0 - SLaserShot.X
            d2 = SLaserShot.camY
            d3 = #0 - SLaserShot.camZ
            SetCamPos32 (d1,d2,d3)
            Render_obj3DList_Object (LaserShotMeshPtr,#PolyCount_LaserShot)
        }
        SLaserShot.Next ()
    }
}
{   ; ShootingEnemy  LS
    SEnemy.InitRegister (MainEnemy)
    if.l (SEnemy.State == #AI_STRAFE)
    {   ; Only Render Enemy's LS if he is active
        colorQuad = #$40FF40
        SLaserShot.InitRegister (EnemyLaserShots)
        loop (lpMain = #MaxLaserShots)
        {
            if.l (SLaserShot.IsActive == #1)
            {
                d1 = #0 - SLaserShot.X
                d2 = SLaserShot.camY
                d3 = #0 - SLaserShot.camZ
            ;   print3 (d1,d2,d3,#50,#20)
                SetCamPos32 (d1,d2,d3)
                Render_obj3DList_Object (LaserShotMeshPtr, #PolyCount_LaserShot)
            }
            SLaserShot.Next ()
        }
    }
}

rts }

3D Coder
  • 141
  • 2
  • It will be very interesting to see what you end up with languagewise. – Thorbjørn Ravn Andersen Jun 21 '20 at 14:00
  • 1
    Note that a good optimizing compiler can do tricks in assembly that may not be what you would write by hand. – Thorbjørn Ravn Andersen Jun 21 '20 at 14:01
  • 1
    @ThorbjørnRavnAndersen : True - once the code is done, there are so many things you can do, to butcher the code, make it faster, remove CLC/SEC here and there, or notice you could (ab)use your index registers in a way that will make them directly reusable in next stage. But, that creates a totally unmaintainable code. Now, I've done that in past,but for me, it's of highest importance to be able to come back later and adjust the code. Which is impossible, once you butcher it manually. Since the code is high-level, I have zero qualms about discarding it altogether (unlike hand-butchered code) :) – 3D Coder Jun 29 '20 at 07:56
  • @ThorbjørnRavnAndersen: Short-term, my ToDo list shows in Top 3: function return values and classes. Those two seem to promise to bring most productivity. After that, I would probably focus on multi-platform codebase - e.g. writing game code once, but reusable on multiple HW targets. – 3D Coder Jun 29 '20 at 08:05
3

This is my experience with C on Z80 and 6502:

  • Zilog Z80/z88dk

    The code generated is pretty decent, not as good as handwritten assembly but good enough for lots of purposes. One advantage on Z80 in relation to C is the existence of IX/IY registers that are used for local variable access / parameter passing. Of course they aren't as efficient as register parameters but it still works good with C paradigm. I tested switching to static variables and found there was a difference, but small.

  • 6502/cc65

    I'm not familiar with 6502 assembly too much but aware of general architecture. Whether I compiled the code with or without static variables for 6502 the impact was very big (IIRC up to 50%). I can't compare with handwritten code since I have no experience.

Bottom line: there is a big difference in processor architecture. Zilog Z80 is much more C friendly, it has decent stack, index registers that allow quite straightforward implementation of many C paradigms, calling conventions, etc. While 6502 is much more limited in implementing re-enterable code or using stack-based variables.

Sep Roland
  • 1,043
  • 5
  • 14
Artyom
  • 273
  • 1
  • 7
  • 3
    My experience is that on the Z80, there is a huge cost difference between the cost of entering a function that uses no parameters or automatic objects, versus calling one that uses some. Once a function is entered, the cost of accessing 8-bit automatic objects using IX addressing isn't too totally outrageous, but accessing 16-bit values using IX often costs at least twice as much and sometimes more. By contrast, 16-bit instructions like mov hl,(addr) or mov de,(addr) are less than 25% more expensive than the 8-bit mov a,(addr). – supercat Jun 21 '20 at 21:57
0

The assumption in the question is incorrect. The common wisdom about C being a bad choice for the 6502, is wrong.

The choice of source language, does not gate the quality of generated code for the 6502.

In fact, code quality on the 6502 (or any other platform for that matter) is gated by the quality of lowering intermediate-representation code to MOS instructions.

Any 6502 codegen must take into account the 6502's architecture quirks: there are few real registers, but there's also a bunch of zero page memory.

High-quality C codegen on the 6502 is possible, but it's not quick to implement. It cannot be done with a toy or a hobbyist compiler, in particular.

See https://www.llvm-mos.org for an example of the foregoing.

johnwbyrd
  • 109
  • 2
  • 2
    What non-toy or non-hobbyist compiler do you know of doing this? – Thorbjørn Ravn Andersen Apr 09 '21 at 20:31
  • 1
    http://www.llvm-mos.org – johnwbyrd Jul 28 '22 at 20:48
  • @johnwbyrd: There are many situations where an optimizing transform would likely be unlikely to adversely affect the behavior of a C program, but cannot be proven sound under the rules of the language. An optimizer that performs such transforms may be able to yield better benchmark results than one which refrains from making optimizations that cannot be proven sound, but such optimizers should nonetheless be recognized as toys unless they document all the cases where optimizations may yield results inconsistent with the language spec--something that so far as I can tell LLVM has yet to do. – supercat Jul 28 '22 at 21:56
  • https://llvm-mos.org/wiki/Current_status#C_compiler Looks interesting. Couldn’t find anything more detailed about the code quality though. – Thorbjørn Ravn Andersen Jul 28 '22 at 23:32