How exactly does it do this?
It should be easy for an analog TV to manipulate the signal like this. You may recall that a CRT TV continuously steers an electron beam to hit each "pixel". There are actually no pixels on the screen, but the circuitry generates them by pointing the electron beam at the right place.
So if you want the electron beam area to be twice as tall, or a centimeter further up, that is quite easy to do by amplifying the vertical steering signal. It is not like an LCD, where you need some kind of digital processor to intercept the data for one pixel and send it to a different pixel. A CRT just puts the pixels in different places to begin with.
Is it possible to preprocess my video signal with a homebrew circuit to achieve the same result (the source is 4:3 letterboxed, the TV is set into true 16:9 mode, and the video manipulation circuit goes between the TV and the signal source)?
Not as easily. Although the CRT's circuitry generates the beam steering signals and makes it widescreen by adjusting where the pixels go, there's no way for the video signal to tell it that. You can't make a video signal that says please shift all the pixels up 5 centimeters; you have to put the picture in different pixels just like an LCD does.
Well actually, there is one. If your TV understands this code, it can tell the TV this a letterboxed widescreen picture, so please zoom automatically.
Do some films automatically display in widescreen instead of letterboxed? Then your TV probably understands this code.
The Wikipedia page shown before has many references describing the code. Your video signal processor would need to count the lines in the video signal, then count a specific position within the line, and then inject some binary pulses. It should be doable with almost any microcontroller, and a little extra circuitry (to make sure the uC can detect the timing pulses at the right voltage and inject the bits at the right voltage). Actually doing it will be a project outside the scope of this answer.