23

The x87 instruction set does not support direct transfers between general purpose registers and floating point registers. This is mainly a consequence of the 8087/80287/80387 being a separate chip attached to the same memory bus of which the CPU is largely (but not entirely) ignorant. So all data transfers between the two have to be passed through memory.

Famously, the FSTSW AX instruction introduced with the 80287 is the main exception to this before the later introduction of the FCOMI family of instructions. Various documents on the internet allude that the instruction is apparently implemented by means of port IO between the CPU and the FPU through the reserved port range F0 to FF, but no details were found.

How exactly is the FSTSW AX instruction implemented on the 80287 (and possibly the 80387) as far as transferring data into AX is concerned? Can the same mechanism be abused by other or custom coprocessors to gain access to the register file?

fuz
  • 1,574
  • 10
  • 35
  • @fuz From old memory, the instruction you mentioned is recognized by both the 80286 and 80287 as an 80287 ESC instruction. The 80287 ignores the busy bit (neither changes it nor examines it) in its status word (it doesn't care) and performs the operation exclusively over the BIU. No WAIT instruction is required as the 80286 is held off during the BIU transaction. I'm not sure what more detail you need, though. What's going on? (I also have a vague memory of the processor extension data channel but don't recall much about it and I may be confusing the FSTSW AX and the FSTSW.) – jonk Jul 16 '22 at 22:19
  • @fuz (I may be able to track down my old hardware manuals for the 80286 and the 80287. Just not today.) – jonk Jul 16 '22 at 22:21
  • @jonk So how does the status word get into the AX register? With FSTSW mem16, it's clear how it works. But FSTSW AX is a special case. – fuz Jul 16 '22 at 22:25
  • @fuz My recollection remains fuzzy. But my recollection is that the register transfer takes place directly. Have a look at PEREQ, to start. I think that is part of the exchange. I didn't work at Intel until the Pentium II (BX chipset) and didn't need to unearth 80286 transactions at that late date. (Besides, Intel *tightly* controlled their internal documents in their vault and since I had no business accessing those documents I would not have been allowed to check them out and read them.) – jonk Jul 16 '22 at 22:30
  • @jonk According to various pages, this mechanism seems to be for data transfers from/to memory. It is not clear to me how it is decided what direction the transfer goes. Maybe there are additional cases for from/to AX. – fuz Jul 16 '22 at 22:59
  • @fuz Like I said, I'd have to get ahold of my old "80286 Hardware Reference Manual" and my "80287 Hardware Reference Manual." They are kind of thin and I don't recall if I boxed them somewhere, or not. They aren't at hand, right now. There probably is enough information there to figure out what must have been done. Better would be to have Intel's internal documentation. But that's not likely. They are pretty good at ensuring eye's only even for those with an appropriate need. I think they should just release them to the public. But it's not likely. – jonk Jul 16 '22 at 23:49
  • @jonk Any progress? – fuz Jul 22 '22 at 18:10
  • No progress as I've been seriously busy, lately. Getting time to track these down likely won't happen on any particular schedule. It'll happen when the right circumstances conspire. I will keep it in mind, though. That said, as I wrote earlier, I believe that pages 2-133ff (of the document you located on the web) provide sufficient information to nail it down. I'd spend less time reading that than looking for my books only to have to read those, too. Have you exhausted your own ability to read about there, and forward? – jonk Jul 22 '22 at 21:27
  • @jonk I've read a lot of this stuff but details are still spotty. What is particularly surprising is the the 80287 pinout has the S0/S1 pins for decoding the instruction stream, but these are NC on the 80287XL; so somehow that does not seem to be necessary for operation of the coprocessor. – fuz Jul 22 '22 at 21:32
  • I'll have to read it thoroughly in order to know whether or not any of it is "particularly surprising." For starters, you know it works. So it's only a matter of having a stunted imagination that something there would be surprising. There's always a mechanism when things work right. It's just a matter of allowing your imagination to carry you to the right place. The internals of the x286 and x287 aren't rocket science. Not compared to these days, anyway. (Took me many months to fully grasp the internals of the Pentium II from internal Intel docs on it.) – jonk Jul 22 '22 at 21:44
  • Have you tried (do you have the equipment and skills necessary) to simply perform these instructions and monitor the bus pins? (Run on a slow clock and use a cheap MSO or logic analyzer.) I often distrust myself (my ability to read a datasheet isn't perfect and datasheets themselves also aren't perfect) and prefer to have hard data to verify my assumptions when reading a document. The x286/x287 devices are pretty slow by today's standards, even not clocked slower, so it should not be difficult. If you are serious about whatever project you are on about, you may want to do that. – jonk Jul 22 '22 at 21:51
  • @jonk I don't have a bus analyser unfortunately, but this kind of analysis is something I plan to perform once I manage to obtain one. My main goal is to understand what's going on. – fuz Jul 22 '22 at 22:05
  • Okay. My first thoughts are: (1) that you don't need a bus analyzer (the name usually implies software algorithms needed to decypher bus cycles of a particular nature.) Just a logic analyzer capable of signal capture; (2) do you have an 80286/80287/82288/82264 system up and running right now? (I don't have such a system anymore and my Kaypro 286i is long since dead) or are they still available for purchase? But my second realization is that if you don't want such a system then you may be considering an FPGA approach. Or is there another reason for understanding the exact details? I'm curious. – jonk Jul 22 '22 at 23:39
  • I guess I'm trying to find a personal motivation for digging things out for you. – jonk Jul 22 '22 at 23:40
  • @jonk I have many of these at home in different computers, most work. The 80286 is my favourite CPU. I eventually plan to write an operating system for it that makes use of protected mode as it was intended. So I would really like to understand all these details as good as possible. – fuz Jul 23 '22 at 00:40
  • Geez. I actually consider the 80286 as an "80386 forced by C-suite suits to be prematurely released." You cannot switch from protected mode back to real mode without a processor RESET taking place. (It's why the keyboard processor has the ability to do that.) It is VERY slow to get back to real mode from protected. When working at Intel, their training classes on writing protected mode operating systems (which I took) used the 80286 as the basis for the "beginner's 101" class. You will definitely want the books I've got stored away. – jonk Jul 23 '22 at 04:29
  • @jonk I love the 80286 precisely because it is so quirky and weird. Modern RISC designs are just all the same and boring. Know one, know them all. As for getting back to real mode, if you do a triple fault, you can get that reset done a lot quicker. Let me know if you have any literature recommendations! – fuz Jul 23 '22 at 04:35
  • It is quirky. I don't think anyone will argue there. And I very much appreciated seeing some support for Multics finally showing up in the hardware (for the first time in a microprocessor, so far as I'm aware.) And I got a lot of work done on the IBM PC/AT platform. – jonk Jul 23 '22 at 04:52

1 Answers1

12

Since this is about low level operation, let's start with the fact that the CPU/FPU does not provide an FSTSW AX instruction, only an FNSTSW AX. When encountering FSTSW AX, the assembler issues two instructions (*1):

9b       for FWAIT and
DF E0    for FNSTSW AX

Addressing

The important difference between the 8087 and 80287 is that the 287 no longer snoops the CPU bus, but acts now as an I/O device (*2) - which in turn can be used by other CPU as well. Addressing is done using chip select and a two bit address via:

  • /NPS1, NPS2 - Numeric Processor Select - essentially the Chip Select signals for the 80286
  • CMD0, CMD1 - Command 0/1 - essentially Port/Register address lines of the FPU

In case of the 80286 communication these are decoded as the word size I/O Ports 0F8h, 0FAh, and 0FCh (*3/*4). Note, that these addresses are only fixed within the CPU. External decoding is needed to map the 80287 to these addresses:

Address /NPS1 NPS0 CMD1 CMD0
 0F8h     L    H    L    L
 0FAh     L    H    L    H
 0FCh     L    H    H    L
 0FEh     L    H    H    H

Essentially coupling (and buffering) A1 and A2 to CMD0/1. Additionally the 80287 read/write signals (/NPRD, /NPWR) need to be connected to the CPU's /IORD, /IOWR (or better the signals decoded by the 82288).

The three Ports/Registers are used for 5 distinct transfers:

CMD1/0 (Port) R/W
  00   (0F8h)  W   Opcode to 80287
  00   (0F8h)  R   CW or SW from 80287
  01   (0FAh)  W   Exception Pointer to 80287
  10   (0FCh)  W   Data to 80287
  10   (0FCh)  R   Data from 80287

Command Transfer

A command transfer consists of one or more writes to the first port (0F8h) containing the FPU command to be executed. Depending on command the Status (SW) or Control (CW) Word can be read right after from the same port.

Data Transfer

For data transfer the 80286 provides a DMA like mechanism called Processor Extension Data Channel. If an FPU instruction contains a data transfer the CPU sets up the PEDC address, length and direction from the instruction.

When the FPU is ready to transfer, it raises PEREQ (Processor Extension Request). In case of write to the FPU, the CPU will read the data from memory (*5), provide it on the data lines and pull PEACK (Processor Extension Acknowledge) low. With writes from FPU the same sequence happens. End of transfer (and reinitialization of the PEDC) is signalled by /BUSY going high.

These transfers go to and from Port 0FCh.

FSTSW AX

FSTSW is directly done as read of port 0F8h - after writing the FSTSW command to port 0F8h.

(Here it gets a bit fuzzy, as I can't find the code I'm looking for - Many years ago I attached a 287 to a 68k system, I did only find part of the notes so far. I'm still searching for a command table.)


*1 - Which opens Pandora's Box of Definitions:

  • Is FSTSW an instruction?
  • Is it a macro?
  • Is it something else?

Or - dogmatics behold - is Assembler maybe not a 1:1 representation of machine code?

*2 - The 8087 is really a second processor as it takes over the bus to perform its own read/write cycles - although, not completely independent, as it relies on the CPU to do read on the operand address first, but discards the data. The FPU captures the address for further use and, if a read is to be done, also the first data byte/word. After that it takes over the bus and does all follow up ready or write on its own.

The 287 in turn does not handle the bus but relies on the 286 to feed it with nice aligned 16 bit words. This is also the reason why a 287 can not operate with an 8088. It simply lacks a way to access memory in general and bytewise memory in particular.

*3 - The 386/387 change data size to 32-bit access using only addresses 0F8h and 0FCh.

*4 - The 386 added A31 to the address as selector for simply systems (noone will need 4 Gi address space), do 0F8/0FCh becomes 0800000F8h/FCh

*5 - One cycle when reading word aligned data, two if byte aligned - that's why word alignment does speed up a 80287 a tiny bit.

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • According to the datasheet I found, CMD0 and CMD1 are A1 and A2, not A2 and A3. Note that bus snooping is still done to decode the instruction stream (see S1 and S2 pins). This i287XL datasheet has some more explanation! So this all looks like the 80286 does in fact have a vague notion of the 80286 instruction set that goes further than just “d8 to to df are coprocessor escapes.” – fuz Jul 16 '22 at 23:56
  • 2
    @fuz Note that the 287XL is a 387 with 287 pinout. Also, while the 286 does not 'understand' what 287 instructions do, it does decode all 287 instructions to setup PEDC operation (data address, data length and transfer direction). – Raffzahn Jul 17 '22 at 00:12
  • Interesting, I was not aware of the 286 knowing anything about 287 instructions! The pinout thing is correct, which is why a documentation of these signals on the i287XL will also apply to the 80287. – fuz Jul 17 '22 at 00:23
  • @fuz Erm, I'm no aware that the 287 does snoop the bus in any way. Also, what these S1/S2 should be? Are you referring to NPS1/2? Sies are the chip selects from address decoding. – Raffzahn Jul 17 '22 at 00:58
  • Sorry, meant S0 and S1 (pins 1 and 2). These are connected to the same pins on the 80286, allowing the 80287 to snoop and decode the instruction stream in parallel as on the 80286. It seems like only memory operands (and AX?) are passed through the special port I/O transaction. Curiously, the i287XL does not use these bits according to the datasheet. – fuz Jul 17 '22 at 01:39
  • Interesting though that port F8 is mentioned for both SW and CW, even though FNSTCW AX does not exist. I wonder if such an instruction was planned. – fuz Jul 17 '22 at 01:47
  • @fuz Interesting, the 287 works fine without any handling of S0/S1 - more so they are N.C. on 287XL, which works quite fine with a regular 286. CW Transfer - not sure if it's really possible. Hand written notes :(( – Raffzahn Jul 17 '22 at 02:20
  • @fuz Thanks for that datasheet capture! It helps a lot. I'm scouting from about 2-133 and forward and it looks like there may be enough there to answer your question. – jonk Jul 17 '22 at 07:11
  • Ot, but can you answer this question: https://stackoverflow.com/questions/28689819/how-does-an-80386-80287-combination-behave-in-32-bit-mode – Yuhong Bao Feb 12 '23 at 02:11
  • @YuhongBao IIRC its simply that the 287 does not know about the 386, so it will always only request the basic number of transfers. – Raffzahn Feb 12 '23 at 03:06
  • Actually this reminds me of FSETPM/FRSTPM. Is this the only way the 287 determine the format? – Yuhong Bao Feb 12 '23 at 04:06
  • @YuhongBao The 287 is complete self contained. i does not have any knowledge of the CPU it operates for or with. That's why it was easy adaptable to other CPUs as well (like 68k + 287). Also, I did write a short answer for that SO question. – Raffzahn Feb 12 '23 at 17:37
  • Do you know how the busy bit returned by FNSTSW works? And when would you use FSTSW or FNSTSW? – Yuhong Bao Feb 20 '23 at 06:05