| ▲ | admax88qqq 8 hours ago | |||||||||||||||||||||||||
> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop. But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters? | ||||||||||||||||||||||||||
| ▲ | pennomi 8 hours ago | parent | next [-] | |||||||||||||||||||||||||
That assumes scaling laws still hold up. A bigger model might end up only incrementally more intelligent. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | ACCount37 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Not only you could: you would also want to. The likes of Mythos show that the scaling laws are real, and you can x5/x2 the total/active params and get meaningful gains. If "inference per param" gets cheaper? Up the params and get more intelligence for the same price. | ||||||||||||||||||||||||||
| ▲ | deweywsu 8 hours ago | parent | prev [-] | |||||||||||||||||||||||||
Quite true | ||||||||||||||||||||||||||