| ▲ | __0x01 4 hours ago | ||||||||||||||||||||||||||||||||||
> A problem with this is that in order to confirm the findings, you’ll need an expert human. But generally expert humans are busy doing other things. The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans. LLM generated code will eventually contain UB. EDIT: added "eventually" | |||||||||||||||||||||||||||||||||||
| ▲ | flohofwoe 3 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
It would already help a lot when the C and C++ standards start to clean up the list of Undefined Behaviour (e.g. there's a lot of nonsense UB currently in the C standard which could easily become Defined Behaviour - like the "file doesn't end in a new-line character" thing): | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | thomashabets2 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Author here. > The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans. Yup. But the point of the article is that even expert humans cannot do this alone. And as I wrote, LLM+junior won't suffice either. We need LLM+senior experts. And it's a problem that we have way more existing UB than expert capacity. Now, will LLMs and experts both miss UB in some cases? Of course. There's no 100% solution. But LLMs, I claim, will find orders of magnitude more, with low false positive, than any expert. Even if these expert humans (like in the OpenBSD case for the two bugs I found, one of which was UB) are given more than three decades to do it. I didn't even use the best model, complex code target, or time. I just wanted to choose a target that has a high chance of having very good experts already having audited it. | |||||||||||||||||||||||||||||||||||
| ▲ | eru 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Our LLM powered coding assistance are pretty good at doing lots of busywork that doesn't require all that much smarts. So they can supervise running our UB checks, like Valgrind, and making the linters happy. | |||||||||||||||||||||||||||||||||||
| ▲ | lelanthran 4 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
> LLM generated code will eventually contain UB. Yes. Even in languages other than C (i.e. you will get behaviour that nothing in the input specified). When LLMs generate code, all languages have UB. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||