| ▲ | aurareturn 5 hours ago | |||||||
It uses 10 chips for 8B model. It’d need 80 chips for an 80b model. Each chip is the size of an H100. So 80 H100 to run at this speed. Can’t change the model after you manufacture the chips since it’s etched into silicon. | ||||||||
| ▲ | 9cb14c1ec0 4 hours ago | parent | next [-] | |||||||
As many others in this conversation have asked, can we have some sources on the idea that the model is spread across chips? You keep making the claim, but no one (myself included) else has any idea where that information comes from or if it is correct. | ||||||||
| ||||||||
| ▲ | grzracz 5 hours ago | parent | prev | next [-] | |||||||
I'm sure there is plenty of optimization paths left for them if they're a startup. And imho smaller models will keep getting better. And a great business model for people having to buy your chips for each new LLM release :) | ||||||||
| ||||||||
| ▲ | ubercore 5 hours ago | parent | prev [-] | |||||||
Do we know that it needs 10 chips to run the model? Or are the servers for the API and chatbot just specced with 10 boards to distribute user load? | ||||||||
| ||||||||