| ▲ | fooblaster 7 hours ago | |||||||
FPGAs will never rival gpus or TPUs for inference. The main reason is that GPUs aren't really gpus anymore. 50% of the die area or more is for fixed function matrix multiplication units and associated dedicated storage. This just isn't general purpose anymore. FPGAs cannot rival this with their configurable DSP slices. They would need dedicated systolic blocks, which they aren't getting. The closest thing is the versal ML tiles, and those are entire peoxessors, not FPGA blocks. Those have failed by being impossible to program. | ||||||||
| ▲ | fpgaminer 6 hours ago | parent | next [-] | |||||||
> FPGAs will never rival gpus or TPUs for inference. The main reason is that GPUs aren't really gpus anymore. Yeah. Even for Bitcoin mining GPUs dominated FPGAs. I created the Bitcoin mining FPGA project(s), and they were only interesting for two reasons: 1) they were far more power efficient, which in the case of mining changes the equation significantly. 2) GPUs at the time had poor binary math support, which hampered their performance; whereas an FPGA is just one giant binary math machine. | ||||||||
| ||||||||
| ▲ | Lerc 6 hours ago | parent | prev | next [-] | |||||||
I think it'll get to a point with quantisation that GPUs that run them will be more FPGA like than graphics renderers. If you quantize far enough things begin to look more like gates than floating point units. At that level a FPGA wouldn't run your model, it would be one your model. | ||||||||
| ▲ | ithkuil 7 hours ago | parent | prev | next [-] | |||||||
Turns out that a lot of interesting computation can be expressed as a matrix multiplication. | ||||||||
| ||||||||
| ▲ | dnautics 3 hours ago | parent | prev | next [-] | |||||||
I don't think this is correct. For inference, the bottleneck is memory bandwidth, so if you can hook up an FPGA with better memory, it has an outside shot at beating GPUs, at least in the short term. I mean, I have worked with FPGAs that outperform H200s in Llama3-class models a while and a half ago. | ||||||||
| ||||||||
| ▲ | alanma 6 hours ago | parent | prev [-] | |||||||
yup, GBs are so much tensor core nowadays :) | ||||||||