| ▲ | spiderfarmer 14 hours ago | |||||||
I always wonder how much smaller and faster models could be if they were only trained on the latest versions of the languages I use, so for me that is PHP, SQL, HTML, JS, CSS, Dutch, English, plus tool use for my OS of choice (MacOS). Right now it feels like hammering a house onto a nail instead of the other way around. | ||||||||
| ▲ | ACCount37 12 hours ago | parent | next [-] | |||||||
Not very. LLMs derive a lot of their capability profile from the sheer scale. LLMs have something that's not entirely unlike the "g factor" in humans - a broad "capability base" that spans domains. The best of the best "coding LLMs" need both good "in-domain training" for coding specifically and a high "capability base". And a lot of where that "base" comes from is: model size and the scale of data and compute used in pre-training. Reducing the model scale and pruning the training data would result in a model with a lower "base". It would also hurt in-domain performance - because capabilities generalize and transfer, and pruning C code from the training data would "unteach" the model things that also apply to code in PHP. Thus, the pursuit of "narrow specialist LLMs" is misguided, as a rule. Unless you have a well defined set bar that, once cleared, makes the task solved, and there is no risk of scope adjustment, no benefit from any future capability improvements above that bar, and enough load to justify the engineering costs of training a purpose-specific model? A "strong generalist" LLM is typically a better bet than a "narrow specialist". In practice, this is an incredibly rare set of conditions to be met. | ||||||||
| ||||||||
| ▲ | BarryMilo 13 hours ago | parent | prev | next [-] | |||||||
I seem to remember that's one of the first things they tried, but the general models tended to win out. Turns out there's more to learn from all code/discussions than from just JS. | ||||||||
| ||||||||
| ▲ | Someone1234 13 hours ago | parent | prev | next [-] | |||||||
Wouldn't that mean they're bad at migration tasks? I feel like for most languages, going from [old] to [current] is a fairly to very common usage scenario. | ||||||||
| ▲ | nareyko 13 hours ago | parent | prev [-] | |||||||
[dead] | ||||||||