What operations could early PC 3D accelerators perform?

Question

As I understand it, a modern GPU is actually just a Turing-complete processor which happens to be heavily optimised for massively data-parallel workloads. (You can even buy "graphics cards" that don't generate any graphics!)

As I understand it, earlier graphics accelerators such as the legendary Voodoo 2 were fixed-function devices, hard-wired to do only 1 thing, but do it extremely fast. Back in those days, what features your games could have depended on your graphics card. If (for example) your graphics card doesn't have trilinear filtering, you can't have trilinear filtering! End of discussion. So (as I recall it) each new GPU would shout about all the fancy new features it has. [Today, of course, that's all a function of the software you're running, not the hardware that runs it. Oh, wait... hello RTX!]

What I'd like to know is, what did these fixed-function devices actually do? Which exact parts of the rendering pipeline are done by the GPU, and what still needs to be done by the CPU? I presume perspective projection and Z-buffering is done in hardware on the GPU. What else? When you turn around, who rotates the entire world mesh? The CPU or the GPU? Does the CPU say "render this polygon please" for each individual polygon? Or does it upload an entire mesh into the GPU's memory and say "render this mesh" each frame?

I think the simple, general, answer is that early GPUs accelerated GLIDE and OpenGL v1. Exactly how they did would be specific to the GPU. — Brian H, Jul 06 '21 at 13:07
According to https://en.wikipedia.org/wiki/Graphics_processing_unit the term GPU "was coined by Sony in reference to the 32-bit Sony GPU (designed by Toshiba) in the PlayStation video game console, released in 1994". I didn't know that. Are you asking only about 3d accelerating GPUs, or earlier 2d graphic acceleration hardware too? — LAK, Jul 06 '21 at 13:36
@LAK I'm specifically interested in game-oriented 3D acceleration. — MathematicalOrchid, Jul 06 '21 at 13:39
There's the early stuff, which was mostly about making texture-mapping faster (e.g. S3 ViRGE), and then the slightly later chips that had hardware T&L. If you're interested in understanding what a fixed-function render pipeline did, studying the OpenGL 1.x API might be more fruitful than examining the hardware. — fadden, Jul 06 '21 at 14:38
As an aside: on my P200 the supplied S3 ViRGE made texture mapping slower. At least anecdotally. I couldn't prove it at this distance. — Tommy, Jul 06 '21 at 14:53
@Tommy My experience with the ViRGE was that it made the game look a little better but decreased the frame rate (e.g. in Descent). So it's not quite fair to compare the performance directly to software rendering, but it didn't really improve the gaming experience in a way that justified the cost of the hardware. — fadden, Jul 06 '21 at 18:45
I would like to point out that ‘Turing-complete processor’ doesn’t mean much, something can be Turing complete and still be almost useless for general-purpose computing (see for example the famous Magic: The Gathering TCG, which is in fact Turing complete). A much better term in cases like this would be ‘general purpose processor’, though that’s admittedly a bit disingenuous when talking about a GPU (they’re very good at a few specific things, and happen to be useful for a bunch of others as a side effect of how they do those specific things). — Austin Hemmelgarn, Jul 06 '21 at 20:38
@AustinHemmelgarn Intel once had a project to produce a x86 CPU that was highly optimized for graphics operations so it could be multi-purpose, but the intent was to use it in graphics boards. They dropped it before it reached the market. — Mark Ransom, Jul 07 '21 at 00:50
@AustinHemmelgarn found more information on that Intel project, it was called Larrabee and was cancelled in 2010. — Mark Ransom, Jul 07 '21 at 00:57
@MarkRansom In a way Larrabee lived on up until relatively recently, just not as a GPU. Intel’s MIC platform derived significantly from the Larrabee project and actually made it to market, and sold pretty well from what I understand (good enough that they kept producing them until 2020), but it was completely useless for GPU work (and actually kind of useless for most other things too, it was too expensive and too highly specialized to see usage outside of supercomputers, and even there it wasn’t exactly great). — Austin Hemmelgarn, Jul 07 '21 at 02:02
You should define "early" - e.g. the Nvidia GeForce 2 still had a fixed function pipeline, but it was much more elaborate compared to hardware acceleration of earlier PC graphics cards. And then there is the graphics pipeline in Sun workstations (even earlier), which is fixed function, but done with a coprocessor instead of dedicated non-GP hardware as in the GeForce 2. So the situation was more complex than described in your question. — dirkt, Jul 07 '21 at 03:40
@dirkt I'm specifically interested in PC-based GPUs, basically up until they became fully programmable. I've already got some nice answers specifically about the Voodoo, which I don't really want to invalidate, but I am interested in the broad strokes of what came after as well. Unsure if I should make a new question or just modify this one. — MathematicalOrchid, Jul 07 '21 at 16:55
If you wish to go a bit deeper, useful keywords might be "Scitech display doctor", eventually renamed UniVBE. Source code for various versions is obtainable and will indicate various functions that were and were not provided by specific cards. More primitive is Video BIOS. — Eric Towers, Jul 08 '21 at 23:19

score 27 · Accepted Answer · answered Jul 06 '21 at 14:51

Having owned a Voodoo 1 back in the day, it did pixel painting only, and was slightly buggy in that.

The CPU's job was to transform, clip and project all geometry, and to have transferred any necessary textures to the GPU at some point. Vertex attributes were then supplied to the GPU for rendering — screen location and depth, texture coordinates and colour. The original Voodoo was a single-texturing GPU only, so e.g. Quake's light maps are achieved in two passes.

The GPU then did all per-pixel operations — texture mapping, Gouraud shading, optionally fog, and z-buffering. Every one of those steps was optional. Edge anti-aliasing was also available, but in the sense of computing alpha based on pixel coverage and rendering as a partially-translucent pixel. So geometry ordering became a factor.

Furthermore, the GPU had a slight bug in its handling subpixel precision for vertex locations so you had to snap them to 1/16th of a pixel boundaries.

This is all pretty similar to what the GPU in the PlayStation offered, although it is fixed-point only, doesn't offer perspective-correct texturing or subpixel placement of vertices, doesn't do per-pixel fog or z-buffering, and has a much more limited sense of transparency. On the other hand, it's much more flexible in terms of using tiny source textures, or portions of them, with local palettes and optional mirroring repeat patterns; meanwhile the CPU has a dedicated 3d maths coprocessor attached. There's also quite a bit more there for DMA transfer of geometry lists — the usual sort of stuff you'd expect from a graphics-dedicated architecture.

score 25 · Answer 2 · edited Jul 08 '21 at 12:22

What did old-style GPUs actually do?

As so often, when it comes to early 3D, Fabian Sangland's website is a good source. Here for example an article about 3dfx' Voodoo 1/2 cards: The Story of the 3DFX Voodoo 1

Linke mentioned in comments, best way to get an idea about what are the steps that can be moved between units is looking at OpenGL. In general, the most regular function that get repeated most often are the ones that got moved to GPU first. A clear winner for this is texture mapping. It's a rather simple series of lookup operations were triangle coordinates get transformed into memory addresses giving texture data to be outputted in raster lines.

I presume perspective projection and Z-buffering is done in hardware on the GPU.

Nop, well, it did Z-buffering, as that is as well an easy lookup operation (in fact it used most of its RAM bandwidth). Voodoo 1 is a rather stupid texture engine (with transparency). It does texture filtering. Triangles had to be feed in screen coordinates which get textures applied and each resulting pixel checked for depth (Z-buffer) before written to screen memory.

Everything before that had to be done on the CPU - plus as much filtering of obscured objects as possible, as the sustained triangle rate of a Voodoo 1 was about 550k/s, so with 30 fps (which was awesome at the time) a scene should be well below 15k triangles. Quite awesome for back then, but also showing why good texture design was way more of an art back then than today.

The Voodoo 2 was not much different, except faster (3M triangles/s) and able to apply two textures at once. This enabled the use of a basic texture and a lighting texture in one pass. This was, depending on engine and content a sped up of up to 10 times for complex scenes.

When you turn around, who rotates the entire world mesh?

The CPU.

Does the CPU say "render this polygon please" for each individual polygon?

Even more primitive, the CPU has to turn polygons first into triangles and feed the Voodoo 1/2 with those.

Or does it upload an entire mesh into the GPU's memory and say "render this mesh" each frame?

Nop, this was something that added to 3dfx' downfall, as Nvidia implemented this first for their GeForce 256 (IIRC) - the Starting point when GPU became more flexible.

This answer seems to say that the Voodoo doesn't do Z-buffering. That implies that you'd have to get the chip to turn triangles into pixels, and then read those back out of the framebuffer to Z-buffer them on the CPU, then write them back to the framebuffer again. That doesn't sound right... — MathematicalOrchid, Jul 08 '21 at 08:18
@MathematicalOrchid yes, you're right, this needs to be worded more careful in context. Of course does it do Z-Buffering, otherwise rasterization would be rather fruitless . — Raffzahn, Jul 08 '21 at 11:04
@MathematicalOrchid Do note that while it's true that the Voodoo does do Z-buffering, it's not strictly required - the first Playstation didn't. It did result in plenty of graphical artefacts, of course, but it was still worthwhile. Especially when you consider how much RAM bandwidth it cost the Voodoo :) — Luaan, Jul 09 '21 at 10:37
@MathematicalOrchid That's one alternative, yes. Just keep in mind nothing we do is perfect - each approach has its own disadvantages and flaws (a z-buffer still has finite detail, for example - z-fighting is one example of that). The PS1 had ridiculously obvious artefacts - in fact, it didn't even use painter's algorithm. The only thing you had was manually ordering whole meshes and using single-sided polygons. Heck, even their famous "running dinousaur" demo had blatantly obvious artefacts that a simple z-buffer would fix. It was a flaw people were eager to forgive. — Luaan, Jul 11 '21 at 18:27

What operations could early PC 3D accelerators perform?

2 Answers2