| ▲ | nixon_why69 3 hours ago | |
Why not have a bunch of SRAM and various operations like "Q4 matmul" in silicon? Model weights and even architectures could still evolve on a platform like that. | ||
| ▲ | ac29 2 hours ago | parent | next [-] | |
Doesnt "a bunch of SRAM" top out at maybe a few gigs per chip (with zero area used for logic)? You'd need an order of magnitude more to fit even a fairly weak general purpose LLM model. | ||
| ▲ | throwa356262 2 hours ago | parent | prev | next [-] | |
I belive that is what NPUs are. The issue is the very huge amount of DRAM and high bandwidth these model require. | ||
| ▲ | 2 hours ago | parent | prev [-] | |
| [deleted] | ||