Remix.run Logo
tybug 5 hours ago

I actually think there's another angle here where PBT helps, which wasn't explored in the blog post.

That angle is legibility. How do you know your AI-written slop software is doing the right thing? One would normally read all the code. Bad news: that's not much less labor intensive as not using AI at all.

But, if one has comprehensive property-based tests, they can instead read only the property-based tests to convince themselves the software is doing the right thing.

By analogy: one doesn't need to see the machine-checked proof to know the claim is correct. One only needs to check the theorem statement is saying the right thing.

pron 5 hours ago | parent [-]

Right, I said that property based tests are easier to read, and that's good. But people still have to actually read them. Also, because they still work best at the "unit" level, to understand them, the people reading them need to know how all the units are connected (e.g. a single person cannot review even PBTs required for 10KLOC per day [1]).

My point isn't so much about PBT, but about how we don't yet know just how much agents help write real software (and how to get the most help from them).

[1]: I'm only using that number because Garry Tan, CEO of YC, claimed to generate 10K lines of text per day that he believes to be working code and developers working with AI agents know they can't be.