I can't answer the question "why doesn't the 6502/65C02 use the PC to do the indirection to execute JMP (abs)", but I can answer some of the other points.
Visual6502 is the best site to find out about the inner workings of the 6502.
I'll be using the "Random Control Logic" signal names as used in the simulator, which are the same names used in Hanson's block diagram. If you rotate the diagram by 90 degrees, it roughly corresponds to the silicon (actual size used on silicon is different, so take it with a grain of salt). The block diagram is sufficient to understand what is going on, and easier to read than the silicon.
For the ROM decoder, I'll use the simulator names, which are different from the 6507 mnemonics.
You can follow both types of signals by adding a loglevel of at least 6 to the simulator URL, or by pressing Trace more often enough in the expert simulator mode. Decoding has "don't care" values, so some of the signals may be accidental.
How do JMP abs and JMP (abs) work internally?
(What circuitry is used? Is the implementation cheap?)
In general, instructions are executed through several states (numbered T0 to T6). The 6502 has a kind of two-stage pipeline, where the next instruction is already fetched and decoded while the last stage(s) of the previous instruction still executes. In T0, the next opcoded is prepared to be fetched, and in T1, the opcode is decoded, and the next byte after the opcode is prepared to be fetched. This is why every instruction execution ends with T0 and T1, and starts with T2 if more than two cycles are needed. (I'd assume T0 is also the phase where the 6502 checks for interrupts, and replaces the opcode with a BRK if it finds one, but I didn't verify this).
So beginning in the T2 cycle of JMP abs, the low byte of the destination address is fetched. It is output on the DB bus (DL/DB), then stored in the the B input register of the ALU (DB/ADD), added to a zero (SUMS, 0ADD) from the A input, and finally temporarily stored in the adder hold register. This is necessary because the T0 cycle already loads the high byte of the destination address onto the ADH bus, and both must be stored in PCL and PCH at the same time (ADD/ADL, ADL/PCL and DL/ADH, ADH/PCH).
For JMP (abs), the first cycles are very similar (the only different ROM signal is the absence of T2‑jmp‑abs), but in T3, the operand address is only output onto the address bus and not stored in the PC. The low byte of the final destination address that is fetched this way is eventually again stored in the B input register. But before that, the old value of the B input register is incremented by 1 using the carry, and this is used as the address of the high byte of the final destination address. Overflow for that is not handled, which explains the "infamous wrapping effect". The final transfer to the PC works similarly as for JMP abs.
So both jumps use the datapath circuitry just like every other instruction, it's neither particularly cheap nor expensive.
How could one test a JMP (abs) implementation using the PC?
(Is there anything that would prevent it from working?)
You could modify the decode ROM and decode logic in the simulator. The simulator uses three files to store the silicon data in a simple format, which are available on github. There are already special signals T3-jmp and T4-jmp for the "indirect" phase, and you may need an additional signal (or re-use one of those two) to ensure loading of the PC in the T2 phase in absence of T2-jmp-abs. And of course you need to rewire the T3 and T4 phase.
You'll likely need some more space for transistors, so a tool to create horizontal or vertical space (increment all coordinates greater than some threshold) in the data files would be helpful.
JMP (abs)via the PC in the 4th and 5th cycle wouldn't work. After all, it's identical to the 2nd and 3rd cycles, and loading the PC is known to work because ofJMP abs. The ultimate proof would be to modify the ROM/control logic in visual6502 to do that and see what happens. As for interrupts, I always thought interrupts could only happen on the opcode fetch cycle, because the 6502 can't resume mid-instruction, and the first opcode address isn't stored anywhere. – dirkt Aug 13 '16 at 09:11T0andT1states correctly, see answer below. So that's not a problem. – dirkt Aug 14 '16 at 07:42