I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.

▲

petra 4 hours ago | parent | next [-]

Maybe one possible path(to make weaker models highly capable) is making the job of the llm as easy as possible.

I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.

And maybe there could be a business model around creating those libraries.

▲

calgoo an hour ago | parent | next [-]

So in my limited experience: The smaller the model, the bigger the harness. The biggest issue becomes the context window. For big models you can kind of just give it bash access and let it run... while with the smaller ones you need to fully manage the context in each LLM call.

If you can ask the model for a specific function; with a spec design (typed languages help too) then the small models are great! I have had good progress with generating small python modules for example, but you need verification rounds to catch issues.

So test driven design + a good spec sheet + a very detailed todo.md (or even better if its todo.json because then the LLM does not need to manage it, you do from the harness) is your best bet for small models.

▲

pianopatrick 3 hours ago | parent | prev [-]

I think as well there might be "algorithms" that can work with local LLMs. With local LLMs there is a small context window, but not that much cost per token. So perhaps there is a way to do lots of small prompts that work in a sequence to produce a result.

Like perhaps you could produce 5 versions of a piece of code, and then compare them to choose the best.

Also if the local LLMs can call tools, maybe you can use static analysis tools to catch errors and try again in a loop or process of some sort.

There also might be certain languages that work better because those languages have better static checks.

	▲	jrm4 3 hours ago \| parent [-]
		Yes. LITERALLY THIS. I do this! Not hypothetical. I'll write a detailed prompt for a function, hand it off to 5 or so models (all of which are on my local machine), wait about 5 min and then compare.

▲

jrm4 3 hours ago | parent | prev [-]

I mean, this is what I'm doing. I'm guessing my process is very different because I'm holding the hand of the project way more along the way, but even that to me probably makes for a more enjoyable.

Which is to say, I might use AI to do an outline/organizational , but I'm prompting every chunk of code "one-by-one," (e.g. at about the "function" level) which still feels lightyears ahead of what I used to do.