| ▲ | redox99 6 days ago | |
And I expect blackwells to hold value even more (already very LLM optimized, and semiconductor processes will slow down). | ||
| ▲ | joefourier 6 days ago | parent [-] | |
Yeah most of the performance increases have mostly been from architectural improvements like reduced precision tensor cores. AFAIK FP4 is basically the limit for floating point matmuls, after which you need to switch to integer addition if you want to reduce bits, and I don’t think we’ve figured out 1-bit LLMs just yet. | ||