| ▲ | ACCount37 34 minutes ago | |
Scale is always desirable, and there are always gains from scale. It's a matter of whether you can afford training and inference at increased scale. There is a real trend of smaller models becoming more "capability-dense" - i.e. the best 8Bs of today beat the best 32Bs of 2 years ago. This is in part a product of distillation being used to train the smaller models. But people consistently underestimate how "capability hungry" the world is. There are diminishing returns on model capabilities in some sort of "summarize the search results" applications, but as capabilities improve, LLMs enter, get their footing in and begin to dominate new niches. At times, expensive, highly desirable niches. I do not expect anyone at the frontier to pop up and say "no reason to train a new model" within the following decade. There will always be a demand for an LLM that's 5-10% more capable and more reliable at some advanced task, and generational upgrades will keep delivering those 5-10%. From increased scale and improved training both. | ||