26

(This question also applies to other Game Boy games, but Pokémon seems to be one of the best-documented.)

Pokémon Red & Blue (Red & Green in Japan) is a pair of Game Boy games about capturing "Pocket Monsters", or "Pokémon", by fighting them with other Pokémon. Since its release lots of bugs and glitches have been found[1], many of which are caused by overflow errors.

The Game Boy contained a Sharp LR35902 CPU, a Z80 derivative that was something of an intermediate design between the Z80 and 8080:

This processor is similar to an Intel 8080 in that none of the registers introduced in the Z80 are present. However, some of the Z80's instruction set enhancements over the stock 8080, particularly bit manipulation, are present. Still other instructions are unique to this particular flavor of Z80 CPU.[2]

Both the Intel 8080[3] and, by extension, the 8080-compatible Zilog Z80[4] have a carry bit that is set on operations that overflow. A derivative of these chips would also have these; indeed this is not documented as a difference between the Z80 and the LR35902[5].

To the part that is confusing me: the LR35902 CPU had an overflow flag, and there are many exploitable glitches in Pokémon Red that are caused by overflows; there are glitches caused by overflows in most places that overflows can occur. There must be a reason that the developers didn't use this flag on so many occasions, but I can't work out what it is.

Why are there so many overflow bugs in Pokémon Red?

user3840170
  • 23,072
  • 4
  • 91
  • 150
wizzwizz4
  • 18,543
  • 10
  • 78
  • 144
  • 7
    At least in the US games, you have to do some pretty non-obvious work in order to trigger these bugs, so they likely wouldn't be found by manual testing (or maybe even automatic testing). – Batman Oct 30 '16 at 18:26
  • 1
    @Batman Well, the "wet" 255 item glitch doesn't require you to do too much (buy a drink, chuck the drink, give the drink away, chuck everything else, modify lots and lots of memory values). – wizzwizz4 Oct 30 '16 at 18:31
  • 18
    Just a thought - I would consider the popularity of Pokemon Red and Blue to be a significant factor to the number of their known bugs. Another way of saying this is that perhaps the number of bugs in the Pokemon games is not that different from an average Game Boy game, but there are more people who are actively looking for them, and more people who are interested in exploiting them. – tehDorf Nov 02 '16 at 23:52
  • 7
    BTW pokemon was not an early game boy game. It was a relatively late and big game (some gameboy carts are only 32K, Pokemon R/B/Y was a megabyte) – Peter Green Sep 09 '17 at 12:37
  • What do you mean by overflow errors? The overflow flag in a Z80 is used for detecting arithmetic overflow for signed numbers. For example 0x7f +0x01` will not set the carry flag (there is no overflow if this is interpreted as unsigned arithmetic) but will set the overflow bit. To me, when people say "bug caused by overflow error" I usually think of buffer overflow, not arithmetic overflow. – JeremyP Sep 12 '17 at 10:45
  • @JeremyP To be fair, there were a lot of both! :-) But I mostly meant arithmetic, because buffer is pretty much standard everywhere. – wizzwizz4 Sep 12 '17 at 15:41
  • 8
    It was Game Freak's first project. They just were sloppy programmers. This function merely divides by four, in the usual Game Freak style of doing things. – Maya Jun 06 '18 at 16:35
  • 1
    Pedantic comment: Overflow is not carry!! The Gameboy CPU had carry; it did not have overflow. Therefore, you could detect unsigned overflow, but not signed. As long as your number was 8 bits. Anything longer would probably use the 16 bit addition instruction, which had no carry flag. That aside, checking for overflow when it wasn't reasonably expected in gameplay, making each math op take 3 times as long would eat more battery power and ROM space, two very definite negative points. – Orion Nov 22 '18 at 01:41
  • @NieDzejkos I can't follow what's going on here. Are they trying to divide HL by 4? – puppydrum64 Oct 16 '21 at 02:43

6 Answers6

24

I cannot speak about Pokémon in particular, but as a programmer for ~30 years, I'll answer thus: either laziness, incorrect assumption, or surprise.

  • Laziness
    After an operation that overflows, you need to write extra code to check for the overflow, and then decide what to do about it. That's extra time, and extra work.

  • Incorrect assumption
    (Often characterised by "That'll never happen!") Have you ever visited a website, or used a program, and an error box popped up with something like "User won't see this"? That's because the programmer may have considered the overflow case, but decided it wouldn't happen so didn't code for it (other than put in a debug statement that they forgot to remove). Needless to say...

  • Surprise
    For example, just say at a point early on in the game you can do something that trebles your points. Because it's early, the programmer doesn't even consider checking for overflow because the current score isn't that high yet. But if a high-scoring player works their way back to that point and triggers it then, then it might overflow the score. The player surprised the programmer.

There are a couple of other, lesser reasons: maybe they ran out of code room on the ROM and had to strip out error handling code; or maybe they wanted those Easter Eggs to be found; but it's almost always one of the top three.

John Burger
  • 1,335
  • 1
  • 11
  • 18
  • 1
    So... nothing specific to the Game Boy? Huh. I assumed there would be. – wizzwizz4 Oct 29 '16 at 14:08
  • 5
    "This will never happen" (and variants thereof) feel to me like the worst possible assumption to make in programming. Either it can happen, and if so it might be worth checking for; or it can't happen, and there's no point in checking for it (if it happens anyway, depending on capabilities, either crash hard or muddle through). – user Sep 09 '17 at 16:59
  • 7
    @MichaelKjörling And yet people do it all the time. The most egregious example in my experience was a web portal where there was a theoretical chance that two people could be served the same session id due to it consisting of two application servers. The developer calculated the chance of a clashing id and came up with the answer of billions to one, forgetting the "birthday paradox" effect and that hundreds of people were using the system every day and this made a clash virtually inevitable. The consequence was one user seeing the confidential data of another user. – JeremyP Sep 13 '17 at 09:53
  • 1
    @aCVn: Many programs are subject to the constraints: (1) given valid data, produce correct (and thus valid) results; (2) given maliciously-constructed data, don't do anything harmful, but behave arbitrarily otherwise. Ensuring fully-predictable behavior when given invalid inputs requires a lot of validation code, but merely ensuring harmless behavior generally requires much less. Problems primarily arise when there's confusion about which parts of the code are responsible for the small amount of validation that is necessary. – supercat Apr 11 '19 at 15:34
  • 4
    Obligatory xkcd: https://xkcd.com/2200/ – John Burger Sep 10 '19 at 00:15
  • I'd also add a combination of laziness and surprise - that is, when the game encounters an unknown state, the question then becomes whether it should crash or not. Handling overflows as Pokemon did meant that, if there was a cosmic ray event that caused a bitflip to an unexpected Pokemon number encounter, it handled it by just trying to interpret it as a Pokemon (As MissingNo.) - rather than a direct Crash To...No Desktop. The user experience is to some extent improved by it just working when something went wrong. – Alexander The 1st Oct 21 '21 at 07:59
  • How long did it take to find the 256-level crash in Pac-Man? – Thorbjørn Ravn Andersen Aug 12 '22 at 21:43
24

Overflow doesn't mean what you think. That flag exposes the internal ALU carry from bit 6 -> bit 7. It's needed when you are handling the most significant byte of a 2-complement number, because you can't use the carry for that purpose here: it's jumbled by the MSB sign bit.

When you don't add or subtract 2-complement numbers (MSB isn't meant as sign bit but just another bit), it has no meaning. It's not set on increment or decrement instructions when they wrap around 255->0 or 0->255 either.

Janka
  • 2,162
  • 11
  • 12
  • I thought it exposed the internal ALU carry from bit 7 -> /dev/null. I'd say that this is the answer I'm looking for. – wizzwizz4 Oct 30 '16 at 16:49
  • 7
    @wizzwizz4: No, that one is the regular carry flag. Programmers could had (and should had ) checked that on the 8080 already. However, that's the general sloppyness John Burger mentioned. Meh, I just use INR and that's it. – Janka Oct 30 '16 at 16:53
  • 3
    @wizzwizz4: (and future readers) see Understanding Carry vs. Overflow conditions/flags for signed vs. unsigned. Most CPUs that have a CF and an OF set them this way. But there's a terminology gap here: "overflow" can also mean "exceeding the range of the type". For IEEE float, the overflow behaviour is saturation to +/-Infinity. For integers on normal CPUs, the overflow behaviour is wraparound (which you can detect with OF for signed wraparound or CF for unsigned wraparound). – Peter Cordes Dec 15 '17 at 04:45
  • Some CPUs have a saturating integer add, which is useful for audio/video for example. (multiply the brightness -> saturate to white instead of wrapping to black). Saturation is another kind of overflow behaviour. So it is correct to talk about unsigned overflow in a computer science sense. Anyway, detecting integer overflow efficiently is basically an unsolved problem in computing. Most languages don't provide checks, and in asm putting a conditional branch after most instructions would be horrible for perf and code-size (plus you need error-handling code as a branch target). – Peter Cordes Dec 15 '17 at 04:49
  • I'm still not sure that's an accurate description of overflow; -128 + -128 has no carry from bit 6 to bit 7 but generates overflow. The test for addition is: initial signs are the same, final sign is different. r = a + b; overflow = ~(a ^ b) & (a ^ r) & 0x80; – Tommy Apr 11 '19 at 18:20
  • Thing is, the Game Boy can't jump, call, or return based on the overflow flag. I'm pretty sure it doesn't have one at all (even if it did all the commands that use it are gone so it doesn't matter) – puppydrum64 Oct 16 '21 at 02:40
  • 1
    @PeterCordes: The first step to handling overflow efficiently would be to defining better language semantics to detect what programmers would generally need to know: whether any integer operations within a block of code might have yielded an observably arithmetically incorrect result because of overflow. Allowing an implementation to, at its option, perform out-of-range arithmetic in artihmetically-correct fashion without flagging an overflow could in many cases allow some major optimizations that would not be possible if all overflows are specified as trapping. – supercat Oct 29 '21 at 16:35
14

I think your premise is wrong.

Firstly "overflow" in most cases doesn't mean pure arithmetic overflow, it means overflow of some other limit, checking said limits would require more than a single extra instruction.

Secondly in many of the glitches involving overflow the overflow is a secondary part of the glitch.

  • Using a rare candy on a level 255 pokemon results in a level 0 pokemon, but the player isn't supposed to have a level 255 pokemon in the first place.
  • Encountering a "glitch pokemon" corrupts the hall of fame due to an overflow in the sprite decompresser but the player shouldn't be encountering glitch pokemon in the first place.
  • In psywave the pokemons level is multiplied by 1.5, this can only overflow if the pokemon is over level 100 which again should never happen.

As to the broader question of why pokemon Red/Blue has so many "interesting" glitches (rather than boring crash bugs) I think it's a combination of the large size of the game (a whole megabyte in size) with the limitations of 8 bit platforms.

The very limited total ram (8K video ram, 8K work ram, 4x8K cart ram, less than 1K high ram) along with the fact that it's generally faster to use hard-coded memory locations on 8-bit platforms means that the same hard-coded memory location will be used for different things in different contexts. If you can set a memory location in one context and then cause it to be read in another context you can end up with memory locations set to predictable values that were outside the ranges the developers believed they could have.

Combine that with a large non-linear game written on a platform where resources were precious (so you wouldn't initialise something or check something unless you thought it needed doing) and there are lots of opportunities for unexpected combinations.

Peter Green
  • 2,884
  • 20
  • 22
  • 1
    BTW, I read of an interesting "virus" someone came up with for one of those games. When connecting to another cart it will effectively install a patch to the other cart in the ROM, which will hide a Pokemon under a truck that was rumored to have the Pokemon under it (but never did). The author pondered what would have happened if such a virus had been written back in the day when Nintendo was denying rumors of the hidden Pokemon. – supercat Sep 17 '17 at 00:59
  • I find that claim hard to believe. – Peter Green Sep 17 '17 at 01:18
  • 1
    I don't have the youtube link handy, but someone posted a description of the mechanisms used. IIRC, a substantial portion of the battery-backed storage in the cart isn't used by the game, and it's possible to corrupt some of the game data in such a way as to cause execution to transfer to the otherwise-unused space. If the video was a joke, the author went through an awful lot of trouble to concoct consistent technical explanations for how things tied together. – supercat Sep 17 '17 at 03:11
  • Whether or not the claim itself is true, it is certainly possible. – forest Mar 09 '18 at 03:43
  • Actually, writing to the ROM itself would not be possible (ROM is "Read-Only Memory"). It would be possible though to write a corrupted save file that exploits the game and changes its behavior. – forest Mar 24 '18 at 21:37
  • 1
    Writing to the rom is not possible, so the question then becomes can you find an exploit that lets you hook your code in at the right time to trigger the effect you want without having to effectively rewrite the game (which there is nowhere near enough space for). – Peter Green Mar 25 '18 at 01:58
  • 4
  • @Greenonline: Yup. That looks like it. – supercat Jun 07 '18 at 15:04
  • @PeterGreen: See the link posted by Greenonline. – supercat Jun 07 '18 at 15:05
  • Pokémon Red and Blue actually had 32K of RAM on the cart, split between save data ("SRAM") and work space for the game ("WRAM"). See https://datacrystal.romhacking.net/wiki/Pok%C3%A9mon_Red/Blue:RAM_map for a breakdown – Kaz Apr 09 '19 at 14:24
  • Thanks, I knew the rom was banked but I didn't realise cart ram was also banked. – Peter Green Apr 11 '19 at 09:55
  • Yeah, there's a buffer overflow in Gen 1 Pokemon games that can be triggered with just a joypad sequence in the right spot in game. It allows arbitrary code entry after that (again with joypad) and you can hook in a routine entered into RAM and alter the game arbitrarily (up to what will fit in the bits of free RAM). – RETRAC Jul 17 '21 at 21:12
8

Electronic Gaming Monthly no. 124 from 1999 notes that the original Japanese Pokemon games had a long, difficult development process. The source code was so bad that when it came to doing the western versions the original Pokemon Red game was recreated using the newer Pokemon Green code, but even that was not an easy task.

When you have poor quality source code, cartridge size limitations, and a long development process which towards the end will inevitably focus on getting the game released rather than finding and fixing rare bugs, you end up with software like Pokemon Red and Blue. Brittle and full of obscure bugs.

Also note that both games were written in assembler, not a higher level language like C. At the time C compilers for embedded systems were not as advanced as they are now, and didn't support the Z80 all that well anyway, and tended to produce large and slow code. Most Gameboy games were written in assembler as a result, and prone to errors such as these.

user
  • 15,213
  • 3
  • 35
  • 69
0

It could be many of the previous answers, But it could also be intentionally because of the need of saving space. Having to get a non trivial program to fit into a predefined space is no simple task. This is in general one of the curses of embedded programming, the many things you take for granted now.

kam
  • 101
0

The answer of John Laziness is a good start. But, I think it is reductive to reduce the issues to the laziness of the developpers.

I/ The lack of kernel side mitigation

There is still many many vulnerabilities in our program. But these are now harder to exploit. And so less people talk of these.

In the gameboy, there is absolutely not any compiler/kernel side mitigation. Of course because there is no kernel in gameboy, no OS, but also because contrary to modern compiler, the automatic mitigations: ASLR, DEP, stack cookie, PIE did not used to exists. Even in real OS of this period.

II/ The developper tools

Most of the tools to produce safe code are very recent.

AFL has been created in 2014. Source the afl page Source RPISEC.

Static analyzer trough a bit older used to have many many implementation issues.

III/ The language used

The gameboy roms have probably been written in pure assembly. I have reversed some. It is even harder to maintain assembly than C! And harder to maintain will make the developper make more mistakes. But also make the devlopper be more lazy.

IV/ Still a bit of lazyness

Today, even with such many modern tools, around 50% of C/C++/rust projects that I meet do not have such tools installed in the project pipeline. The amount of time needed to install these is ridiculous but developpers want to focus on the real project instead of security.

  • The question refers to integer overflow, not buffer overruns. The mitigations you've listed don't prevent those, but they make the program crash early when they occur. – wizzwizz4 Aug 12 '22 at 09:30
  • @wizzwizz4 : yes DEP and stack cookie make the program crash when the corruption occurs. I think my post is relevant because as with aslr, they block hackers from gaining remote access control as well as they make it harder to speedrunner to gain benefit of the vulnerability to gain time in the speedrun: with ASLR, the speedrunner will not know where the exploit will go. with DEP/Stack canary, the program crash... and the speedrunner fails as well... – ultimate-anti-reversing Jul 08 '23 at 19:33