Remix.run Logo
aurareturn 4 hours ago

We are both wrong.

First, it is likely one chip for llama 8B q3 with 1k context size. This could fit into around 3GB of SRAM which is about the theoretical maximum for TSMC N6 reticle limit.

Second, their plan is to etch larger models across multiple connected chips. It’s physically impossible to run bigger models otherwise since 3GB SRAM is about the max you can have on an 850mm2 chip.

  followed by a frontier-class large language model running inference across a collection of HC cards by year-end under its HC2 architecture
https://mlq.ai/news/taalas-secures-169m-funding-to-develop-a...