Remix.run Logo
marginalia_nu 11 hours ago

LLMs are trained to predict tokens on highly mediocre code though. How will it exceed its training data?

movedx01 11 hours ago | parent | next [-]

Probably the same way other models learned to surpass human ability while being bootstrapped from human-level data - using reinforcement learning.

The question is, do we have good enough feedback loops for that, and if not, are we going to find them? I would bet they will be found for a lot of use cases.

bluGill 11 hours ago | parent | prev | next [-]

Because you ask it to improve things and so it produces slightly better than average results - the average person can find things wrong with something, and fix it as well. Then you feed that improved result back in and generate a model where the average is better.

/end extreme over optimism.

Retr0id 11 hours ago | parent | prev | next [-]

Humans can decide to write above-average code by putting in more effort, writing comprehensive tests, iteratively refactoring, profile-informed optimization, etc.

I think you can have LLMs do that too, and then generate synthetic training data for "high-effort code".

marginalia_nu 11 hours ago | parent | next [-]

Well state of the art LLMs sure can't consistently produce high quality code outside of small greenfield projects or tiny demos, which is a domain that was always easy even for humans as there are very few constraints to consider, and the context is very small.

Part of the problem is that better code is almost always less code. Where a skilled programmer will introduce a surgical 1-3 LOC diff, an incompetent programmer will introduce 100 LOC. So you'll almost always have a case where the bad code outnumbers the good.

Retr0id 10 hours ago | parent [-]

Current LLMs do tend to explode complexity if left to their own devices but I don't think that's an inherent limitation. Mediocre programmers can write good code if they try hard enough and spend enough time on it.

monkaiju 5 hours ago | parent | prev [-]

Thats because humans have "understanding" they can use to assess quality, without understanding "trying harder" just means spending more "effort" distilling an average result, at best over a larger sample size.

4 hours ago | parent | prev | next [-]
[deleted]
utopiah 11 hours ago | parent | prev [-]

Who are you to question our faith? /s