| ▲ | nl an hour ago | |||||||
Releasing this on the same day as Taalas's 16,000 token-per-second acceleration for the roughly comparable Llama 8B model must hurt! I wonder how far down they can scale a diffusion LM? I've been playing with in-browser models, and the speed is painful. | ||||||||
| ▲ | aurareturn an hour ago | parent | next [-] | |||||||
Nothing to do with each other. This is a general optimization. Taalas' is an ASIC that runs a tiny 8B model on SRAM. But I wonder how Taalas' product can scale. Making a custom chip for one single tiny model is different than running any model trillions in size for a billion users. Roughly, 53B transistors for every 8B params. For a 2T param model, you'd need 13 trillion transistor assuming scale is linear. One chip uses 2.5 kW of power? That's 4x H100 GPUs. How does it draw so much power? If you assume that the frontier model is 1.5 trillion models, you'd need an entire N5 wafer chip to run it. Very interesting tech for edge inference though. Robots and self driving can make use of these in the distant future if power draw comes down drastically. | ||||||||
| ▲ | LASR an hour ago | parent | prev [-] | |||||||
Just tried this. Holy fuck. I'd take an army of high-school graduate LLMs to build my agentic applications over a couple of genius LLMs any day. This is a whole new paradigm of AI. | ||||||||
| ||||||||