Remix.run Logo
TomasBM 8 hours ago

If you have some requirements/specifications, and the piece of code fits them, then it runs.

Alternatively, if you have some vague idea [1] about what you expect to see/have, and the running code satisfies that idea, then it also runs.

Obviously, there are plenty of non-functional specs (e.g., security, cleanness, readability) that a code should probably fulfill before one finds it acceptable, but these are also not somehow impossible for state-of-the-art models to satisfy.

[1] Vibe, if you prefer, tho I dislike the term. Another related term is eyeball estimation.

qsera 8 hours ago | parent [-]

But it is hard to verify it, right?

If you use rsync clone by an LLM to copy a million files, will you bother to verify every single one was copied correctly?

TomasBM 6 hours ago | parent [-]

Well, unless you needed those million copies for whatever reason, that is an example of spam or denial-of-service, regardless of how it's generated.

And I'm not disagreeing - it is hard to anticipate what needs verifying, regardless if it's functional or non-functional.

But if it's not a spam submission, you could probably design tests or static/dynamic analysis tools that can verify those million copies much faster than manual reviews.

skydhash 6 hours ago | parent [-]

There’s a reason most project don’t have a lot of unit tests. Because a specification, even when fully documented, doesn’t stay static enough to have time to write tests. And if it’s fluid enough, maintaining those tests will hamper velocity.

So you have integration tests that verify the general specs of the software and rely on your skills to verify the finer details. But if you’re using an LLM (and not reviewing every line), you can no longer be confident about those details.

And reviewing every line kills the speed advantage of using LLM.