Remix.run Logo
aurareturn 4 hours ago

I was indeed wrong about 10 chips. I thought they would use llama 8B 16bit and a few thousand context size. It turns out, they used llama 8B 3bit with only 1k context size. That made me assume they must have chained multiple chips together since the max SRAM on TSMC n6 for reticle sized chip is only around 3GB.