| ▲ | zozbot234 2 hours ago | ||||||||||||||||
All 1T models are not equal. E.g. how many active parameters? what's the native quantization? how long is the max context? Also, it's quite likely that some smaller models in common use are even sub-1T. If your model is light enough, the lower throughput doesn't necessarily hurt you all that much and you can enjoy the lightning-fast speed. | |||||||||||||||||
| ▲ | p1esk an hour ago | parent | next [-] | ||||||||||||||||
Just pick some reasonable values. Also, keep in mind that this hardware must still be useful 3 years from now. What’s going to happen to cerebras in 3 years? What about nvidia? Which one is a safer bet? On the other hand, competition is good - nvidia can’t have the whole pie forever. | |||||||||||||||||
| |||||||||||||||||
| ▲ | wiredpancake 15 minutes ago | parent | prev [-] | ||||||||||||||||
[dead] | |||||||||||||||||