| ▲ | audunw 6 hours ago | |||||||
It doesn’t make any sense to think you need the whole server to run one model. It’s much more likely that each server runs 10 instances of the model 1. It doesn’t make sense in terms of architecture. It’s one chip. You can’t split one model over 10 identical hardwire chips 2. It doesn’t add up with their claims of better power efficiency. 2.4kW for one model would be really bad. | ||||||||
| ▲ | aurareturn 6 hours ago | parent | next [-] | |||||||
We are both wrong. First, it is likely one chip for llama 8B q3 with 1k context size. This could fit into around 3GB of SRAM which is about the theoretical maximum for TSMC N6 reticle limit. Second, their plan is to etch larger models across multiple connected chips. It’s physically impossible to run bigger models otherwise since 3GB SRAM is about the max you can have on an 850mm2 chip.
https://mlq.ai/news/taalas-secures-169m-funding-to-develop-a... | ||||||||
| ▲ | moralestapia 6 hours ago | parent | prev [-] | |||||||
Thanks for having a brain. Not sure who started that "split into 10 chips" claim, it's just dumb. This is Llama 3B hardcoded (literally) on one chip. That's what the startup is about, they emphasize this multiple times. | ||||||||
| ||||||||