How was Prince of Persia "better/faster" with RWTS18?

Question

I was reading about Prince of Persia over at POP Code Review. In that article, the writer printed an interview with Roland Gustafsson, the inventor of RWTS18 copy protection for the Apple II series.

Roland says:

It would have been better for them to have created a copy-system that created exactly the same format, RW18 and just make exact copies!! The experience was MUCH better, faster, etc.

As a kid, when I played these games (including C64 games) I seriously thought all of these "cracktros", "trainers", etc. were actually a part of the game! It wasn't until much later I realized that they were pirated games.

Anyway, my copy of Prince of Persia came on three disks. Turns out, that's because it was cracked to fit on the standard RWTS16 format from Apple which only supported 140K disks instead of the 157K (two disk) version of the original POP.

So, my question is, how would it have been much better? Other than a little less disk swapping. Even with the three disk version, I don't remember it being terrible with constant swapping.

The only real negative I remember is that is crashed frequently. I suppose this is because of the cracked version? I never had the original to compare.

The game was designed and optimised for this file format. It's a little like saying "How can memory optimisation make a game faster / better?", but for a disk instead of RAM. — wizzwizz4, Feb 09 '17 at 16:23
"Better" in the sense of "loads faster, less disk swapping". Not "better gameplay". It's easy to forget today how slow loading from a disk was. — dirkt, Feb 09 '17 at 17:36
@dirkt not for me considering I still use my many "vintage" computers quite often. :-) — cbmeeks, Feb 09 '17 at 18:17
@dirkt the Apple II disk was pretty fast compared to the C64 where even with fast-loaders a game could take 5 minutes to start. I remember playing games on my friends C64, which involved getting a snack and eating it while the game loaded. — Michael Shopsin, Feb 09 '17 at 18:36
C64 disk I/O was slow due to the CPU doing all the bitwise work (due to chip/manufacturing issues preventing the CIA chips from being able to do the work), so that's not a great comparison point. :) The first computer/drive combo to work as originally intended over the serial cable was the 128/1571 — Joe, Feb 09 '17 at 19:14
@Joe those "issues" were the direct result of Jack Tramiel according to the Brian Bagnall book. — cbmeeks, Feb 09 '17 at 19:34
facepalms as owners of disk drives talk whose was slower. I had Atari 65 XE with a tape recorder. 30 minutes was a standard, and the game would often fail to load. — SF., Feb 10 '17 at 15:58
@SF. I had to save up for a tape recorder for my TI-99 4/a. Before that, my only software was BASIC games I typed each and every time from magazines. :-) — cbmeeks, Feb 10 '17 at 16:03
@cbmeeks: My situation with Meritum-1 was similar, although luckily didn't last long. OTOH only two magazines existed that covered it, one or two pages each... — SF., Feb 10 '17 at 16:05
@SF. do you remember when magazines would have a BASIC program that did CRC checking on hex dumps of their code? Then we would sit with a ruler and type on pages and pages of hex values to play some cheesy game? Those were the days. :-) — cbmeeks, Feb 10 '17 at 16:15
@cbmeeks: I only ever had patience to type in the CRC checker that way (...using a CRC checker in plain BASIC). Afterwards I never used the new CRC checker, lacking patience to type in anything else :) Besides, by then I had TURBO and access to lots of great Atari software, never mind a bounty of new games appearing - a true renaissance of the platform. — SF., Feb 10 '17 at 16:19
Technically, it didn't come on three disks because it was cracked to fit 140kb disks. It came on three disks because the crackers at the time couldn't make it fit onto two 140kb disks. However, there is a more recent crack that does come on two 140kb disks, and is 100% functionally equivalent to the original. — peter ferrie, Feb 12 '17 at 02:15
@peterferrie that sounds great. Do you have links for this version? Thanks. — cbmeeks, Feb 13 '17 at 13:14
I can't give out the exact location, but the search terms that you need are "prince of persia" "san inc crack". I can share how it was done, though: http://pferrie.host22.com/misc/lowlevel14.htm, and a variant was published in PoC||GTFO zine: http://pferrie.host22.com/papers/apple2_pop.pdf — peter ferrie, Feb 13 '17 at 18:53
You didn't realise the game you didn't buy in a shop was pirated? — Alan B, Nov 29 '23 at 12:49

score 18 · Answer 1 · answered Jun 19 '17 at 01:11

18

The assertion that 4x4 is faster is false, it's easier, yes, but not faster. RWTS18 could read the entire track in one revolution so it is the fastest. I know that 4x4 and 6x2 were also capable of reading in one revolution but not sure about 13 sector... don't think anyone ever tried. :-)

So RWTS18 gave the game developer faster speed, more space with the requirement that they had to manage disk space manually. The fact that it made it harder to crack was a side benefit but not the main goal, believe it or not! That crack that required 3 disks was chuckle-worthy to me and only when the true 2 disk version was recently created was I impressed with the results. (That crack used RWTS16 with data compression, something that was not widely used back in the day.)

RDOS was similar in that the space allocated for a file remained constant. This was to allow for as small an OS as possible.

answered Jun 19 '17 at 01:11

rolandgust

281
2
3

Welcome to Retrocomputing Stack Exchange. Thanks for the answer. I didn't know that RDOS used a custom filesystem. Have you read the [tour]? – wizzwizz4 Jun 19 '17 at 04:36
Sorry, no, I tried to reply but didn't have the necessary points! :-) A bit frustrating when I know the exact answer because it's discussing something that I created! :-) I'm happy to contribute in those areas where I was involved back in the day. :-) – rolandgust Jun 20 '17 at 05:33
Don't worry; you'll have more reputation soon. Perhaps you could answer some unanswered questions. – wizzwizz4 Jun 20 '17 at 07:07
I would expect that 4+4 would have allowed faster read speeds under various RAM constraints, and likewise write speeds. Store D as ((D>>1) and $55)+$95 and (D and $55)+95 and it can be decoded as ((B1 and $55)<<1) +(B2 and $55) without needing any other lookup table. It may be possible to process 6+2 in real-time if one has enough lookup tables, but only if one doesn't need that RAM for other things. – supercat Jun 21 '17 at 16:25
the amount of time available to read nybbles off the disk was constant... the 4x4 routines wasted time waiting for more data, the 16 and 18 sector routines used that time to effectively store more data. One revolution of the disk... that was the constraining factor. – rolandgust Jun 22 '17 at 17:16
1

@rolandgust: Data comes off the disk at a rate of roughly 32 cycles per "nybble" [octet], but unless it is written more slowly than usual, code should be prepared for it to arrive somewhat faster (e.g. once every 28 cycles). If code uses six cycles for a "LDY/BPL" sequence to wait for each byte, that would leave 22 cycles for other processing. Processing data that's stored using anything other than 4+4 encoding would require either using lookup tables, or reading sectors without translating the data and cleaning it up afterward. – supercat Jul 05 '18 at 15:50
It would be super-rare for memory to be so constrained that you couldn't have lookup tables for 6+2 encoding. Sure, the 4+4 code is 1/2 the size as well. :-) In the real world I worked in, there was always room for RW18 which was the fastest way to load levels and store more data. 4+4 is NOT faster. :-) – rolandgust Jul 06 '18 at 19:50
2

The bottom line constraint for speed was how much data could you load in a single revolution of the disk, assuming you could decode it as you loaded it. RW18 stored 0x1200 bytes, RWTS16 stored 0x1000, 4+4 stores about 0x0C00 and all can be decoded in single revolution. RW18 wins. :-) – rolandgust Jul 06 '18 at 20:02
@rolandgust: Incidentally, I've managed to write code to read a DOS 3.3 format track in a single shot. A variation which uses a page of scratchpad for each sector (in addition to the buffer) and cleans up data after reading a track fits in a 256-byte boot block. Looking at the boot ROM, I find it a bit strange that it required that code at $0801 which wanted to use the ROM routines to go through some gymnastics to find its slot and had to use a fixed offset within it, rather than having the ROM do a JSR $0801, and allowing the code there to adjust the sector number and do an RTS if needed. – supercat Nov 27 '23 at 16:09
@rolandgust regarding the two-sided PoP, I managed to pack all of it onto a single side now. :-) – peter ferrie Feb 08 '24 at 05:58

Nick Westgate · Accepted Answer · 2017-06-20T23:00:06.770

12

Less disk swapping, but also faster loading. Much of the performance improvement in Apple II fast DOS implementations (including ProDOS) was due to less latency between reading sectors - and this happens naturally (in a game) when there are more sectors per track and you read them all. By the time you've read a sector and finished processing it, you arrange for the next sector you need to be right under the drive head ready to be read next.

Roland in another interview: (with some similar answers - he must use a FAQ!)

The true speed-up came with sector-latency reading... where the read routines would just start reading whatever was under the head, I used this with the 18 sector routines so the maximum latency was 1/6 revolution of the disk.

At the same link there's actually mention of pirates writing dedicated copiers ("Gogsmith") to reproduce specific 18-sector titles - exactly as Roland suggests in cbmeeks' question's quote.

But as wizzwizz4 implies, although the performance and storage increase were impressive, it's unlikely it would have been suitable for a general-purpose Disk Operating System, since the requirements are quite different. For instance after reading or writing a sector there would be no guarantee that the next sector needed would be under the drive head.

Of course Roland also wrote RDOS for SSI - 13 and 16-sector versions, but not an 18-sector one.

Here's a comparison of nibble encodings and their storage capacities:

Nibbles  Sectors/Track  Total Disk Space in: Bytes   kB
4+4      10                                  89600   87.5
5+3      13                                  116480  113.75
6+2      16                                  143360  140
6+2      18                                  161280  157.5

edited Jun 20 '17 at 23:00

answered Feb 10 '17 at 10:05

Nick Westgate

7,688
1
27
61

What do 4+4, 5+3, etc. mean? I would guess the fastest format uses two bytes of raw disk data per byte of useful data storage, but what do the other numbers mean? – supercat Feb 10 '17 at 16:51
I'm familiar with the principles of GCR, but I'm not sure about the numbering. Since each 8-bit chunk of storage on the disk needs to hold one of 81 distinct values, is the chart simply saying that each 8-bit chunk is used to hold 4, 5, or 6 bits of useful data? Does the number after the + mean anything? – supercat Feb 10 '17 at 19:58
2

Yes, but from a different perspective. The encoding name refers to the way source data is split into disk nibbles. 6+2 means 6 bits get translated into one nibble, two into another (with 4 other bits to make up to six total). Each encoding scheme uses a different number of possible nibble values, not always 81. – Nick Westgate Feb 10 '17 at 20:23
Would 6+2 say anything about how the second byte of data should be encoded [e.g. I think a fast approach using one 128-byte table would build groups of three bytes from groups of four raw chunks (8 bits each before decoding or six after) by having twelve bits mapped directly but having the other twelve stored as the xor of 2-3 bits from the original data; would that still be called 6+2, or something else?] – supercat Feb 10 '17 at 22:40
The way the nibble encoding name is usually used is for the whole process, Any difference such that DOS 3.3 couldn't read it would be a different scheme, though your scheme would be similar if it used the same nibble values. The change from 5+3 (32 nibble values used for data) to 6+2 (64 values) required a new sequencer PROM on the disk controller card. – Nick Westgate Feb 11 '17 at 07:35
1

And with the reduced gaps there would also be problems writing sectors randomly, instead of writing the whole track at once as it's probably done to create the disks. – dirkt Feb 11 '17 at 08:32
@NickWestgate: Of the 128 possible octet values that start with a leading "1", 47 contain three or more consecutive 0's and would require fancier read circuitry. 1: Another 48 contain two or more zeroes but not 3, and would have required fancier read circuitry except that Woz figured out a way of doubling up decoding states when the MSB is set. – supercat Feb 11 '17 at 16:44
@NickWestgate: I find it somewhat curious that Apple didn't use a 4+4 boot sector format, even on higher-density disks, since that would have allowed the boot PROM to time out in case of failure and also maintained boot compatibility with older disks, and would have only minimally affected disk capacity. Way back when, I looked at adding a timeout and found out that the amount of ROM needed to set up X with the controller's offset from $C080 was about the same as the amount of ROM needed to add a timeout, so a version that was hard-coded for slot 6 could... – supercat Feb 11 '17 at 16:58
...repeatedly output ERR or shut off the motor in case of failure (I couldn't free up enough space to shut off the motor and output anything, however). Using a 4+4 scheme should have freed up a lot of space, though, and a 256-byte boot sector which started with the motor on and X loaded with the controller offset could have loaded everything else. Using a 4+4 sector might have meant the loss of one sector of storage, but that would seem a small loss. – supercat Feb 11 '17 at 17:03
I'm super late but whatever: 4+4 is like FM, not MFM. It's a fixed pattern of clock bit, data bit, clock bit, data bit, all exactly window aligned, and hence has exactly the same data density as FM. MFM has double the clock rate but guarantees about gaps between bits, so another way to think of it is using the same window size as FM and therefore needing the same quality of data, but inferring additional information from the position of flux transition within their window. MFM has twice the data density of FM, and is about 26% more efficient than 6+2. – Tommy May 07 '18 at 18:17
@Tommy: MFM as used on PC floppy disks requires the ability to have a phase transition on every cycle. Some RLL formats push densities by about 50% by doubling the clock speed but forbidding phase transitions on consecutive cycles. I think the difference between FM and MFM is that the latter limits the number of consecutive phase transitions within a data stream in exchange for allowing two consecutive cycles without a phase transition, thus making it possible to use the presence of a higher number of consecutive phase transitions as a sync marker. – supercat Jul 05 '18 at 15:42
@supercat I think we might just be talking at crossed purposes: the rule for MFM is insert a clock bit only if the two adjacent data bits are both 0; signify special marks (index, id, data) by dropping a clock that should be there. So it's any cycle but definitely not every. So it uses the same analogue density media as FM and never outputs two consecutive transitions. Which I argue is therefore the same as FM in density terms of windows filled or empty, but with sub-window positioning. That's not how you implement it, but I think it's correct. – Tommy Jul 05 '18 at 17:25
@Tommy: Okay, I just looked up MFM and I had been slightly mis-remembering things. It requires that the time between flux reversals be in the range from 2 to 4 cycles (as distinct from 1-2 for FM), which I'd seen described as having 1 to 3 zeroes between bits (and confused with having 1-3 cycles between flux reversals). I think my confusion stemmed from the fact that the Disk II hardware allows a maximum of three cycles between flux reversals. – supercat Jul 05 '18 at 18:45

How was Prince of Persia "better/faster" with RWTS18?

2 Answers2

Linked