Remix.run Logo
pron 2 hours ago

The experiment failed to produce a workable C compiler despite 1. the job not being particularly hard, 2. the available specs and tests are of a completely higher class of quality than almost any software, not to mention the availability of other implementations that the model trained on.

You can call that a success (as it did something impresssive even though it failed to produce a workable C compiler) but my point in bringing this up was to show that today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.

KajMagnus an hour ago | parent | next [-]

Saying the model failed to write a competitive C compiler makes more sense.

I don't think they tried to do that though.

> today's models are not yet able to produce production software without close supervision, even when uncharacteristically good specs and hand-written tests exist.

That's a good point anyway

ianbutler an hour ago | parent | prev [-]

That's great and all, but that's not the point I was making and you're engaging rather uncharitably on it. So when you view it from the perspective of capability increase it's rather impressive. Note the slope of progress which this experiment was to show.

Edit: Maybe uncharitably is too strong, but we're talking past each other.