Remix.run Logo
mikkupikku 4 hours ago

You're fooling yourself. It's very easy to get demonstrably working results in an afternoon that would take weeks at least without coding agents. Demonstrably working, as in you can prove the code actually works by then putting it to use. I had a coding agent write an entire declarative GUI library for mpv userscripts, rendering all widgets with ASS subtitles, then proceeded to prove to my satisfaction that it does in fact work by using it to make a node editor for constructing ffmpeg filter graphs and an in-mpv nonlinear video editor. All of this is stuff I already knew how to do in practice, had intended to do one day for years now, but never bit the bullet because I knew it would turn into weeks of me pouring over auto-generated ASS doing things it was never intended to do to figure out why something is rendering subtly wrong. Fairly straightforward but a ton of bitch work. The LLM blasted through it like it was nothing. Fooling myself? The code works, I'm using it, you're fooling yourself.

zozbot234 3 hours ago | parent | next [-]

> Demonstrably working, as in you can prove the code actually works by then putting it to use.

That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case. You need actual proof that's driven by the code's overall structure. Humans do this at least informally when they code, AI's can't do that with any reliability, especially not for non-trivial projects (for reasons that are quite structural and hard to change) so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.

mikkupikku an hour ago | parent | next [-]

> That's not how you prove that code works properly

Yes it is. What do you expect, formal verification of a toy GUI library? Get real.

> and isn't going to fail due to some obscure or unforessen corner case.

That's called "a bug", they get fixed when they're found. This isn't aerospace software, failure is not only an option, it's an expected part of the process.

> You need actual proof that's driven by the code's overall structure.

I literally don't.

> Humans do this at least informally when they code, AI's can't do that with any reliability

Sounds like a borderline theological argument. Coding agents one-shot problems a lot more often than I ever did. Results are what matters, demonstrable results.

coldtea 3 hours ago | parent | prev [-]

>That's not how you prove that code works properly and isn't going to fail due to some obscure or unforessen corner case.

So? We didn't prove human code "isn't going to fail due to some obscure or unforessen corner case" either (aside the tiny niche of formal verification).

So from that aspect it's quite similar.

>so most coding agents simply work their way iteratively to get their test results to pass. That's not a robust methodology.

You seem to imply they do some sort of random iteration until the tests pass, which is not the case. Usually they can see the test failing, and describe the issue exactly in the way a human programmer would, then fix it.

zozbot234 3 hours ago | parent [-]

> describe the issue exactly in the way a human programmer would

Human programmers don't usually hallucinate things out of thin air, AIs like to do that a whole lot. So no, they aren't working the exact same way.

coldtea 3 hours ago | parent [-]

>Human programmers don't usually hallucinate things out of thin air

Oh, you wouldn't believe how much they do that too, or are unreliable in similar ways. Bullshiting, thinking they tested x when they didn't, misremembering things, confidently saying that X is the bottleneck and spending weeks refactoring without measuring (to turn out not to be), the list goes on.

>So no, they aren't working the exact same way.

However they work internally, most of the time, current agents (of say, last year and above) "describe the issue exactly in the way a human programmer would".

qsera 3 hours ago | parent [-]

That is not hallucinating...

LLM hallucinating is not an edge case. It is how they generate output 100% time. Mainstream media only calls it "hallucination" when the output is wrong, but from the point of view of a LLM, it is working exactly it is supposed to....

coldtea 2 hours ago | parent [-]

>LLM hallucinating is not an edge case. It is how they generate output 100% time

If enough of the time it matches reality -- which it does, it doesn't matter. Especially in a coding setup, where you can verify the results, have tests you wrote yourself, and the end goal is well defined.

And conversely, if a human is a bullshitter, or ignorant, or liar, or stupid, it doesn't matter if they end up with useless stuff "in a different way" than an LLM hallucinating. The end result regarding the low utility of his output is the same.

Besides, one theory of cognition (pre LLM even) is of the human brain as a prediction machine. In which case, it's not that different than an LLM in principle, even if the scope and design is better.

bachmeier 3 hours ago | parent | prev [-]

> Fairly straightforward but a ton of bitch work. The LLM blasted through it like it was nothing.

One might argue that this is a substitute for metaprogramming, not software developers.

trollbridge 2 hours ago | parent [-]

It's interesting more people haven't talked about this. A lot of so-called agentic development is really just a very roundabout way to perform metaprogramming.

At my own firm, we generally have a rule we do almost everything through metaprogramming.

cindyllm 2 hours ago | parent [-]

[dead]