I don’t recall SPARC systems having separate sockets for discrete FPUs; in particular, the Weitek SPARC POWER µP was a replacement SPARC CPU which derived much of its speed benefit from doubling its internal clock.
On the x86 FPU side of things, quite a few different companies produced FPUs: Intel of course, AMD, IIT, Cyrix, Chips & Technologies... Many of them were marketed as being faster than Intel’s 80287 and 80387 designs; Cyrix’s FPUs even had that in their name — FastMath 83D87! BYTE volume 15, issue 12 (November 1990) has a detailed FPU comparison, with benchmarks and explanations of the differences between the FPUs. Most FPUs used the same interface as the Intel FPUs, with a direct CPU-FPU connection and special opcodes; the Weitek FPUs were the only exception, using a memory-mapped interface. The IIT and Cyrix FPUs were supposedly faster as-is, but they both added some features beyond speed: IIT FPUs had more registers, Cyrix FPUs used more accurate transcendental calculations. Weitek FPUs used a completely different programming model, so they required specific support from applications, and lacked support for 80-bit extended-precision values; they also required specific motherboard support (and it was possible to build systems with both a 387 and a 3167, or a 486 and a 4167).
As far as benchmarks go, the results from the BYTE issue above match what other benchmarks found: the fastest 286-class FPU was the IIT 2C87 (but the benchmark doesn’t cover Weitek’s 1067), and the fastest 386-class FPU was the Weitek 3167, followed by the Cyrix 83D87. Weitek’s 4167 was faster than the 486’s built-in FPU in some operations but not all.
BYTE volume 13 issue 3 (March 1988) contains a number of articles on x86-compatible FPUs, including the Weitek 1167, and explains how to program them.