24

Looking through the source code of the 6502 MS-BASIC, certain parts of it seem more reminiscent of how things would be done on the 8080 than on how they should be done on the 6502. Code to find a line with a specified number, for example, is relatively performance-critical, but Bill Gates' code seems less than optimal:

FNDLIN: LDWX    TXTTAB      ;LOAD [X,A] WITH [TXTTAB]
FNDLNC: LDYI    1
        STWX    LOWTR       ;STORE [X,A] INTO LOWTR
        LDADY   LOWTR       ;SEE IF LINK IS 0
        BEQ FLINRT
        INY
        INY
        LDA LINNUM+1    ;COMP HIGH ORDERS OF LINE NUMBERS.
        CMPDY   LOWTR
        BCC FLNRTS      ;NO SUCH LINE NUMBER.
        BEQ FNDLO1
        DEY
        BNE AFFRTS      ;ALWAYS BRANCH.
FNDLO1: LDA LINNUM
        DEY
        CMPDY   LOWTR       ;COMPARE LOW ORDERS.
        BCC FLNRTS      ;NO SUCH NUMBER.
        BEQ FLNRTS      ;GO TIT.
AFFRTS: DEY
        LDADY   LOWTR       ;FETCH LINK.
        TAX
        DEY
        LDADY   LOWTR
        BCS FNDLNC      ;ALWAYS BRANCHES.
FLINRT: CLC         ;C MAY BE HIGH.
FLNRTS: RTS         ;RETURN TO CALLER.

The code uses a lot of INY and DEY instructions, even though the value of Y at any given spot in the code will always be the same; using an LDY #1 rather than DEY at AFFRTS would have allowed BEQ FNDL01 / DEY / BNE AFFRTS to be replaced with BNE AFFRTS. Further, INY INY could be cheaply replaced with LDY #3 and eliminate the need for the preceding LDY #1. Further, the use of X:A to hold a temporary high/low address foregoes the possibility of using (ind,x) mode [with x set to zero] which could help eliminate some of the gymnastics with Y.

On the other hand, if the code was based on a reworking of a similar algorithm on the 8080, the use of INY/DEY could make sense since the effects would be analogous to incrementing or decrementing HL on the 8080. On the 8080, code could rather efficiently do something like:

; Using Z80 mnemonics for 8080 opcodes
loop:
    ex      de,hl
start:
    ld      e,(HL)  ; Fetch link LSB
    inc     HL
    ld      d,(HL)  ; Fetch link MSB
    inc     HL
    ld      a,c     ; Line number LSB
    sub     a,(HL)
    inc     hl
    ld      a,b     ; Line number MSB
    sbc     a,(HL)
    jc      lp

For the 6502, a more efficient approach would be to have each line preceded by a length byte, allowing something like:

start:
     lda     linNum+1
     sta     lnTemp
     ldy     #2
     ldx     #0
loop:
     lda     (ptr),y
     cmp     lnTemp   ; Carry clear if looking for < what's there
     bcs     oddBall
     lda     ptr
     adc     (ptr,x)
     sta     ptr
     bcc     loop
     inc     ptr+1
     bcs     lp       ; Carry still set from before
oddBall:
     bne     notFound
     lda     lnNum    ; Do other byte of line # if we haven't yet
     sta     lnTemp
     dey
     bne     loop     : If not equal, we need to do LSB of line #
     clc
notFound:
     rts

Only 25 cycles per line in the common case, with an extra 7-8 cycles on each page crossing (depending upon whether the lda(ind),y crossed a page). So about twice as fast as the original code. This code would rely upon the 6502's ability to directly access two addresses relative to (ind) without having to manipulate any index registers, which is something the 8080 couldn't do, but exploiting that ability could have made things much faster. Further, it benefits from the fact that adding an 8-bit value to a 16-bit pointer is much faster than loading a 16-bit value.

Does the 8080 BASIC use routines which largely mirror the logic of the 6502 versions, thus suggesting that the 6502 was strongly derived from it? Or was Bill Gates simply following the coding style of a processor other than the 6502 (perhaps an 8080, or maybe PDP-10 or something else)?

hippietrail
  • 6,646
  • 2
  • 21
  • 60
supercat
  • 35,993
  • 3
  • 63
  • 159
  • 5
    They certainly made their 6502 assembly syntax as close to 8080 syntax (or at least their own 8080 syntax) as they could. Hard to say how much the 6502 source copies from the 8080 source without having them side by side to compare though. It doesn't look machine translated though. –  Oct 28 '17 at 18:37
  • 1
    @RossRidge: It certainly wasn't machine translated. On the other hand, someone who is familiar with the 8080 machine code [I don't think the source exists] might be able to say whether the structure bears any similarity to the 6502 version. If I were designing a BASIC interpreter from scratch for the 6502, I'd prefix lines with the length, and might store the characters of each line--as well as the characters of strings--in reverse order. Adding two to the address of a line (to skip over the line number) before executing would then allow code to notice the end of a line via "dey / bne". – supercat Oct 28 '17 at 18:43
  • 1
    The source code for the original 8080 version exists, but not online, you need to go a Harvard library to read it: https://www.theregister.co.uk/2001/05/13/raiders_of_the_lost_altair/ You can find the annotated disassembly the article refers to pretty easily on the web. –  Oct 28 '17 at 19:12
  • 1
    @RossRidge: Interesting article. After reading it, I looked for and found the aforementioned disassembly. The "find program line" function for the 6502 does somewhat resemble how the code could have been written on the 8080, but not how it actually was written. The code reads the next-line link (two bytes) for purposes of checking against zero, without saving it anywhere, then decrements HL back to where it was so it can call a function to push (HL) and (HL+1) while incrementing HL twice. Maybe that saved a byte or two of code, but in exchange for a huge speed penalty. – supercat Oct 28 '17 at 20:45
  • 2
    @RossRidge: I wonder if Gates and Allen suffered from "If all you have is a hammer, everything looks like a nail" syndrome. Memory-copy loops are very important to a program's performance, but the loops within the interpreter could easily have been made twice as fast for a few bytes. Instead of calling CMPDEHL within the loop, compute bc=65536-#bytes, and then do ldax d/stax h/inx d/inx h/inc c/jnz lp/inc b/jnz lp. Sure RST vectors are cute, but they're not exactly speedy. – supercat Oct 28 '17 at 20:57
  • Which 6502 MS-BASIC source code were you looking through, and where did you find it? – cjs Jul 26 '19 at 03:27

4 Answers4

14

I've seen a number of articles, such as this one which states it was based on 8080 BASIC, and this one which states that 8080 BASIC was first ported to the 6800, which was translated to the 6502. It would make sense to take the methods used in 8080 BASIC and use them on other processors.

Tim Locke
  • 4,811
  • 21
  • 36
  • 1
    Now I'm curious to see the 6800 code, but a google search isn't showing anything. I recall thinking that the Altair 680 seemed sluggish running BASIC even when using a 110-baud teletype. I wonder if the routines to move memory up and down make horrid speed/space trade-offs like on the 8080 version. – supercat Oct 28 '17 at 21:12
  • @supercat - necropost here, but interesting to note that the 6800's always finished at the bottom of every bench I can find. Gate himself suggested that this was an artifact of the machines, but even on faster versions of the 6800 ended up at the bottom of the list too. – Maury Markowitz Jun 30 '19 at 12:20
  • 1
    @MauryMarkowitz: The 6800 has a single index register which makes it hard to avoid frequent loads and stores of the index register when trying to perform a memory move. There are nonetheless some optimization opportunities like unrolling a byte-copy loop 2x which can offer some big performance wins versus simply using two X-register reloads per byte copied. – supercat Jun 30 '19 at 13:59
  • @supercat I've not come across a copy of the source code for 6800 BASIC, but this altair-680-basic repo does have the object code along with scripts and annotations to produce a (slightly) annotated disassembly of it. (Some of this work is still on development branches.) – cjs Jul 26 '19 at 03:22
11

Yes, Microsoft 6502 BASIC was clearly a port of their 8080 BASIC.

Unfortunately the original 8080 source code for Microsoft BASIC isn't easily available, so we can't compare it directly to that without going to the Harvard University library to read the paper copy.

But the Microsoft source code for a later version of their 8080 BASIC, BASIC-80 5.2 (actually marked 5.11 in the source) is available online. This is copyright 1982 (from the copyright notice in INIT.MAC), but contains material obviously from the original 8080 source code, such as the following comment in BINTRP.MAC:

;--------- ---- -- ---- ----- --- ---- -----
;COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
;--------- ---- -- ---- ----- --- ---- -----
;
;ORIGINALLY WRITTEN ON THE PDP-10 FROM
;FEBRUARY 9 TO  APRIL 9 1975

Comparing this with the 6502 source code we see substantial similarities in the overall organization of the code and almost exact correspondence of many routines within it.

START and INIT

For example, the entry point in both versions is at address 0 and contains JMP INIT; in both cases the INIT routine overwrites the target of this JMP with the address of a routine called READY that does a "warm start". After that there are a couple of words storing the addresses of routines to convert a floating point number to an integer and vice versa.

Here's the 8080 version from 1982:

        PUBLIC  START
START:
        PUBLIC  JMPINI
JMPINI: JMP     INIT                    ;INIT IS THE INTIALIZATION ROUTINE
                                        ;IT SETS UP CERTAIN
                                        ;LOCATIONS DELETES FUNCTIONS IF
                                        ;DESIRED AND
                                        ;CHANGES THIS TO JMP READY
                                        ;WARM START FOR ISIS
                                    ;OF THE ROUTINE TO CONVERT [A,B]
                                    ;TO A FLOATING POINT NUMBER IN THE FAC
    DW      FRCINT                  ;TURN FAC INTO AN INTEGER IN [H,L]
    DW      MAKINT                  ;TURN [H,L] INTO A VALUE IN THE FAC
                                    ;SET VALTYP FOR INTEGER

And the 6502 version from 1978:

START:  JMP     INIT            ;INITIALIZE - SETUP CERTAIN LOCATIONS
                                ;AND DELETE FUNCTIONS IF NOT NEEDED,
                                ;AND CHANGE THIS TO "JMP READY"
                                ;IN CASE USER RESTARTS AT LOC ZERO.
RDYJSR: JMP     INIT            ;CHANGED TO "JMP STROUT" BY "INIT"
                                ;TO HANDLE ERRORS.
ADRAYI: ADR(AYINT)              ;STORE HERE THE ADDR OF THE
                                ;ROUTINE TO TURN THE FAC INTO A 
                                ;TWO BYTE SIGNED INTEGER IN [Y,A]
ADRGAY: ADR(GIVAYF)>            ;STORE HERE THE ADDR OF THE
                                ;ROUTINE TO CONVERT [Y,A] TO A FLOATING
                                ;POINT NUMBER IN THE FAC.

STROUT

Another example is the STROUT routine, which clearly works in the same way, modulo the differences in processor architecture, and even has some comments that are nearly identical. Here's the 8080 version:

;
; PRINT THE STRING POINTED TO BY [H,L] WHICH ENDS WITH A ZERO
; IF THE STRING IS BELOW DSCTMP IT WILL BE COPIED INTO STRING SPACE
;
STROUI: INX     H                       ;POINT AT NEXT CHARACTER
STROUT: CALL    STRLIT                  ;GET A STRING LITERAL
;
; PRINT THE STRING WHOSE DESCRIPTOR IS POINTED TO BY FACLO.
;
STRPRT: CALL    FREFAC                  ;RETURN TEMP POINTER BY FACLO
        CALL    GETBCD                  ;[D]=LENGTH [B,C]=POINTER AT DATA
        INR     D                       ;INCREMENT AND DECREMENT EARLY
                                        ;TO CHECK FOR NULL STRING
STRPR2: DCR     D                       ;DECREMENT THE LENGTH
        RZ                              ;ALL DONE
        LDAX    B                       ;GET CHARACTER TO PRINT
        CALL    OUTDO
        CPI     13
        CZ      CRFIN
        INX     B                       ;POINT TO THE NEXT CHARACTER
        JMP     STRPR2                  ;AND PRINT IT...
        PAGE

And here's the 6502 version:

;
; PRINT THE STRING POINTED TO BY [Y,A] WHICH ENDS WITH A ZERO.
; IF THE STRING IS BELOW DSCTMP IT WILL BE COPIED INTO STRING SPACE.
;
STROUT: JSR     STRLIT          ;GET A STRING LITERAL.
;
; PRINT THE STRING WHOSE DESCRIPTOR IS POINTED TO BY FACMO.
;
STRPRT: JSR     FREFAC          ;RETURN TEMP POINTER.
        TAX                     ;PUT COUNT INTO COUNTER.
        LDYI    0
        INX                     ;MOVE ONE AHEAD.
STRPR2: DEX
        BEQ     PRTRTS          ;ALL DONE.
        LDADY   INDEX           ;PNTR TO ACT STRNG SET BY FREFAC.
        JSR     OUTDO
        INY
        CMPI    13
        BNE     STRPR2
        JSR     CRFIN           ;TYPE REST OF CARRIAGE RETURN.
        JMP     STRPR2          ;AND ON AND ON.

PTRGET

Even routines that are different due to feature changes are pretty clearly derived from common original source. For example, the PTRGET routine that that reads a variable name and fetches its value uses a pair of zero-page memory locations in the 6502 version to store the variable name, which is limited to two characters in that older version of BASIC. The newer 8080 code allows up to 40 characters in a variable name, but still stores the first two characters of the name in the C and B registers (memory was used in the 6502 because it has far fewer registers) and compares those first, only moving on to a further comparison with the rest of the name in a memory buffer if that fails.

And of course there's the similarity in the comments and routine name itself. The 8080 version:

; ROUTINE TO READ THE VARIABLE NAME AT THE CURRENT TEXT POSITION
; AND PUT A POINTER TO ITS VALUE IN [D,E]. [H,L] IS UPDATED
; TO POINT TO THE CHARACTER AFTER THE VARIABLE NAME.
; VALTYP IS SETUP. NOTE THAT EVALUATING SUBSCRIPTS IN
; A VARIABLE NAME CAN CAUSE RECURSIVE CALLS TO PTRGET SO AT
; THAT POINT ALL VALUES MUST BE STORED ON THE STACK.
; ON RETURN, [A] DOES NOT REFLECT THE VALUE OF THE TERMINATING CHARACTER
;
PTRGET: XRA     A                       ;MAKE [A]=0

And the 6502 version:

; ROUTINE TO READ THE VARIABLE NAME AT THE CURRENT TEXT POSITION
; AND  PUT A POINTER TO ITS VALUE IN VARPNT. [TXTPTR]
; POINTS TO THE TERMINATING CHARCTER.. NOT THAT EVALUATING SUBSCRIPTS
; IN A VARIABLE NAME CAN CAUSE RECURSIVE CALLS TO "PTRGET" SO AT
; THAT POINT ALL VALUES MUST BE STORED ON THE STACK.
;
PTRGET: LDXI    0               ;MAKE [ACCX]=0.

Comparison with Other Object Code

Though we don't have original source for an 8080 version of Microsoft BASIC from the same era as the 6502 source, we do have binaries that can be disassembled. Comparing this disassembly of Altair BASIC 3.2 we can also find substantial similarities in the code (including all the routines mentioned above), though of course there are no comments to compare.

I myself have been working on a disassembly of the 6800 BASIC that Microsoft wrote for the Altair 680B after they'd written the 8080 basic for Altair's 8800 but probably before they started the 6502 version. Both ports were done by Ric Weiland, and again the 6800 version shows the same substantial similarities in code organization and the details of how the particular routines work.

cjs
  • 25,592
  • 2
  • 79
  • 179
8

Bill Gates and/or Paul Allen likely did not write the 6502 version of Basic. Marc McDonald, one of the very first Microsoft employees, is reported (on the Wikipedia page for Applesoft BASIC, for instance) as having written the 6502 version of Microsoft Basic. But he would have had access to all the source code to Gates and Allen's 8080 BASIC, as well as the same DEC computer on which to run a cross-assembler.

hotpaw2
  • 8,183
  • 1
  • 19
  • 46
2

The original Altair BASIC came in three versions; 4k, 8k and Extended. The numbers referred to the amount of RAM required to run it, as BASIC was loaded to RAM from paper tape. The 4k version running in 4k of RAM had a whopping 790 bytes free.

The 4k and 8k versions used a 4-byte (6-digit) floating point format. The 4k version lacked string variables and some other less important features. The 8k version added strings, some more math functions, and peek and poke. Extended added a "double precision" floating point format. I think the double was actually the 6-byte (9-digit) format found in later 6502 versions, but I'm not sure.

UPDATE: the double-precision system was a 64-bit format. The 40-bit format became the standard during the 6502 port, largely replacing both earlier formats.

I suspect, as you do, that the code was deliberately "copied" as much as possible with an eye to ease of porting rather than code size or performance. New processors were popping up continuously, and tuning for each one would not only be difficult but ultimately fruitless as some of these would be doomed to die. Like the 6800.

BTW, supercat, are you playing with the source in your own versions?

Maury Markowitz
  • 19,803
  • 1
  • 47
  • 138
  • I'm not actively working on any BASIC interpreters, though I've toyed with some conceptual designs. The garbage-collector design leaves perhaps the most room for improvement. If strings were preceded by length (or any other byte which is guaranteed non-zero) and took up a minimum of three bytes, then on each GC pass where one had N bytes of contiguous storage available, one could relocate into that region all strings that were within an N-byte region elsewhere and observe the address of the string (if any) which crosses out of it. In most usage cases, the amount... – supercat May 18 '18 at 18:49
  • ...of free storage would grow significantly after each GC pass, making the process closer to O(NlgN) than O(N*N). A key point, though, would be that as the GC relocated each string it would be replaced with a relocation marker containing a zero byte and the new address. If the first byte of each string is the length, and zero-byte strings are represented as null pointers, this works out nicely. If strings aren't prefixed, however, this works out less well. – supercat May 18 '18 at 18:50
  • 1
    Do you know how the Altair 6800 BASIC did its memory-copy loops? The 6800 isn't amenable to doing anything very fast; I'd guess a self-modifying loop: lda extended / sta 0,x / inx / inc lsb_of_extended / inc b / bne loop" would be best, but wouldn't be at all surprised if the BASIC does something likeldx src / lda 0,x / inx / stx src / ldx dest / sta 0,x / inx / stx dest`. I really don't see any good way to do things without using self-modifying code or trashing SP. – supercat May 18 '18 at 20:43
  • @supercat - I have never seen the original 6800 source, only the 6502 port. Your musings on strings are VERY interesting to me. I am not an assembly programmer, but I have collected ideas wide and far that seem to present the possibility of major performance improvements to the 6502 code. I'm adding yours! – Maury Markowitz May 29 '18 at 16:02
  • @MauryMarkowitz Do you have this collection of ideas documented anywhere that's publicly available, such as a text file in a GitHub repo or something like that? – cjs Jul 26 '19 at 03:26
  • Hey Kurt, as I continued working on it I found it converging on QuickBASIC from a conceptual standpoint. Basically the two key concepts that any reasonably fast BASIC needs is to pre-parse the line so that doesn’t happen at runtime, and some smarts when dealing with numeric formats. But a lot of time is spend in line number lookups, for/next and doing really basic math like +1 in a loop. But I’d be happy to send you what I have although it’s based on an understanding of Atari basic. – Maury Markowitz Jul 27 '19 at 12:10
  • @MauryMarkowitz: I think the biggest "easy win" would be to modify the floating-point code so that if the exponent bytes of all operands were zero, it would use integer math for values up to 65,535, and otherwise any values with zero exponents would get converted to floating-point and processed normally. On my original C64 back in the day, I remember noticing that "FOR I=1 to 10000:NEXT" would produce a whine which decreased in pitch each time I passed a power of two, which implies that the code was spending a lot of time on normalization and denormalization. – supercat Jul 27 '19 at 16:46
  • @MauryMarkowitz: A design improvement which could have a really big payoff for garbage collection would be storing string literals as a "string literal token" followed by a length byte, and having every string on the heap be a minimum of three bytes, the first of which is non-zero. If GC was triggered when heap storage was down to some non-zero N, then the GC could in one pass relocate every object which is entirely within the bottom N bytes of the heap into the empty space, replacing it with a zero byte and the new address, and identify the lowest object... – supercat Jul 27 '19 at 17:03
  • ...that is not entirely within the first N bytes. All objects which were entirely within the bottom N bytes could have their references updated in that same pass. At the end of the pass, the GC could move the object that wasn't entirely within that bottom region and update references to it. This algorithm could then be repeated until a reasonable amount of space had been freed up, and once that was done all objects could be moved en masse back against the top of the used portion of the heap, and all references updated by adding a suitable offset to their addresses. – supercat Jul 27 '19 at 17:07
  • @MauryMarkowitz: Newer strings would be stored lower in the heap, and since newer strings are more likely to be abandoned, the size of the region that could be processed in each pass would tend to increase significantly with succeeding passes. An alternative approach which would be slightly less storage-efficient would be to precede every string with a "spare" byte, not used for storing the length; this would have the advantage of allowing the GC to count up in advance the total size of all strings, which would allow it to stop if future GC passes wouldn't free up enough space... – supercat Jul 27 '19 at 17:15
  • ...to be worthwhile. – supercat Jul 27 '19 at 17:15
  • @supercat - the 16-bit issue is one of the first I looked at, but in MS-derived versions, parsing the line takes longer than running it (so I've been told) so the actual savings is not huge. However, if you pre-tokenize, like Atari BASIC or Basic09, then the parsing goes ~= 0 and an int math package makes sense. You can even decide at edit-time - if everything on the RHS of the = is int and the ops have int versions, replace their tokens with the int equivalents and the = with "assign int to FP var", an invisible INT(). – Maury Markowitz Jul 29 '19 at 12:23
  • A big advantage to that approach is that you can pick and choose which ops you want the int package to have. So maybe just + and - and a handful of functions. Leave out things line SIN and you'll automatical failover lines with that code to the FP side at edit-time. And then you parse any constants into your FP format and leave them in that form in memory. IIF everything is int, call your FP-to-INT on the constants and save a few bytes and some copying time. – Maury Markowitz Jul 29 '19 at 12:28
  • So to strings... I'm not sure I understand your proposal. What is the non-zero byte at the front, a character or a flag/token? – Maury Markowitz Jul 29 '19 at 12:32
  • @MauryMarkowitz: The fact that the sound of the whine changes noticeably with each power of two reached by the loop index suggests that the time required isn't trivial. In Commodore Basic (and I think other MS derivatives) the loop step and limits are computed once at the start of the loop and kept on the stack, so they don't need to get processed from the source each time. Certainly for many purposes, having a "fast case" to handle parsing of numbers up to 65535 would be helpful as well. – supercat Jul 29 '19 at 14:05
  • @supercat In Altair 680 BASIC it's done using the SP, though it attempts to avoid trashing it. See my disassembly here – cjs Jun 05 '22 at 18:22
  • @cjs: Disabling data for the entire duration of a copy operation seems dodgy. I wonder if code might have been faster and safer if it used the B register to count up to 256 iterations at once, and after each group of 256 bytes restored the stack, enabled interrupts, and checked whether it was necessary to process any more groups? – supercat Jun 05 '22 at 18:29
  • Sure, at 21 cycles per byte, on a .75 MHz machine you could copy only 350 or so bytes without running the risk of losing a character at 9600 bps. But who was sending input that would cause a copy at 9600 bps? (Program loads were done in line order, and you needed to copy only for inserts. And a 9600 bps program load might overrun the tokenizer, anyway; I've seen it happen at 19.2k on faster machines.) Who even had the capability for 9600 bps input at the time anyway? – cjs Jun 05 '22 at 18:47