> A problem with this is that in order to confirm the findings, you’ll need an expert human. But generally expert humans are busy doing other things.

The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.

LLM generated code will eventually contain UB.

EDIT: added "eventually"

▲

flohofwoe 3 hours ago | parent | next [-]

It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing):

https://gist.github.com/Earnestly/7c903f481ff9d29a3dd1

▲

layer8 3 hours ago | parent [-]

The easy cases like you cite are also those that don’t cause problems in practice. I’m not sure that would help all that much, other than to slightly reduce internet criticism.

▲

talkin 3 hours ago | parent [-]

Fixing easy cases makes the list shorter, so enables more focus on harder cases.

And it also signals that you actually do want to improve, just a little bit of boy scout rule goes a long way.

▲

gpderetta an hour ago | parent [-]

The issue is that the list is infinite (anything not specified is UB), so actually removing any finite amount of UB from the list won't make it shorter.

(only slightly tongue-in-cheek, I do believe that removing silly things is worthwhile).

	▲	1718627440 41 minutes ago \| parent [-]
		The list of UB categories and rules is not infinite. The list of UB programs is, as is the list of all non UB programs.

▲

thomashabets2 2 hours ago | parent | prev | next [-]

Author here.

> The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.

Yup. But the point of the article is that even expert humans cannot do this alone. And as I wrote, LLM+junior won't suffice either. We need LLM+senior experts.

And it's a problem that we have way more existing UB than expert capacity.

Now, will LLMs and experts both miss UB in some cases? Of course. There's no 100% solution. But LLMs, I claim, will find orders of magnitude more, with low false positive, than any expert. Even if these expert humans (like in the OpenBSD case for the two bugs I found, one of which was UB) are given more than three decades to do it.

I didn't even use the best model, complex code target, or time. I just wanted to choose a target that has a high chance of having very good experts already having audited it.

▲

eru 3 hours ago | parent | prev | next [-]

Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy.

▲

lelanthran 4 hours ago | parent | prev [-]

> LLM generated code will eventually contain UB.

Yes.

Even in languages other than C (i.e. you will get behaviour that nothing in the input specified).

When LLMs generate code, all languages have UB.

▲

eru 3 hours ago | parent [-]

That's a bit silly.

UB means literally no restrictions. So if you standard says 'you have to crash with an error message' that's already no longer UB.

	▲	lelanthran 3 hours ago \| parent [-]
		> So if you standard says 'you have to crash with an error message' that's already no longer UB. Sure. For crashes. But when you instruct an LLM to do something, the output is probablistic, so you may get behviour that is unexpected and/or unwanted. Like storing security tokens in code. Or nuking the production database.