| ▲ | dbbk 3 hours ago | ||||||||||||||||||||||||||||
I'm frankly surprised the focus is still on these enormous "know everything in the world" models. I would think you could create an incredibly lean and smart "just React and React Native" model. | |||||||||||||||||||||||||||||
| ▲ | onion2k an hour ago | parent | next [-] | ||||||||||||||||||||||||||||
"Make a React app to run my coffee shop" requires knowing React but also knowing what a coffee shop is. | |||||||||||||||||||||||||||||
| ▲ | nikcub an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The syntax is the easier part - most programming tasks require the reasoning and understanding of a large world model to solve problems. Fine tuning a 'lean and smart' model works really well for discrete, repeatable high volume tasks like support ticket triage, lead classification, content filtering, labelling, generating content with a voice, etc. Inefficient token burn by throwing large models at everything is definitely a problem - it's like hiring Phd's to answer the phone or to wash dishes. | |||||||||||||||||||||||||||||
| ▲ | onlyrealcuzzo 3 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
> I would think you could create an incredibly lean and smart "just React and React Native" model. You can, but it's not as useful as you might think. It needs to at least understand 1 human language to understand your intent to implement features. If GRAM turns out to be a 5000x multiplier for local reasoning, you could theoretically train a 500M parameter model on just a programming language to understand stack traces to fix bugs and be incredibly powerful. But most people also want it to understand human language to implement features as well. Because then it can't just understand React and JavaScript - it needs to understand thousands of commonly used dependencies, the DOM, CSS, HTML, etc... And for that you need A LOT more parameters than you might expect. You can definitely get a ~3B active parameter model that can run comfortably on today's hardware to be VERY good at coding once all of the SOTA architectures are added to a single model - especially if we get better tool calling to give models better context per language. You might be thinking: why does it need to memorize dependencies? Can't it just stick all of them in it's context and use its super smart brain? No, context is king. You want to keep it as short as possible. The solution is not having a smart model and putting 10M lines of context in it. The solution is having a model with enough parameters to know what it needs to know. Researchers are already working on having "packs" of knowledge where you could download a 20M param pack just for some common dependencies in JavaScript (as an example) - but AFAIK this is likely years away (and may not prove effective). You could get 100x performance if you feed the models ideal context... So a 3B model today can perform almost as good as ~300B model if you give it really good context vs flood it with mostly garbage it doesn't need across your repository. If you feed it 100x more context to make up for its limited memorized general knowledge, it's going to perform thousands of times worse, completely eliminating any advantage it might get from GRAM... | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||