> I think that the prompt is the thing that should be PR'd at this point, because ultimately the spec is what's important.

The fundamental problem there is the code generation step is non-deterministic. You might make a two sentence change to the prompt to fix a bug and the generation introduces two more. Generate again and everything is fine. Way too much uncertainty to have confidence in that approach.

▲

tombert 7 hours ago | parent [-]

If you make the prompts specific enough and provide tests that it has to run before it passes, then it should be fairly close to deterministic.

Also, people aren't actually reading through most of the code that is generated or merged, so if there's a fear of deploying buggy code generated by AI, then I assure you that's already happening. A lot.

▲

ben-schaaf 4 hours ago | parent [-]

How is "fairly close to deterministic" anywhere near good enough? LLMS aren't anywhere near cheap enough to do this either.

That said it's so trivial to do, why haven't you done that already?

▲

tombert an hour ago | parent [-]

I have actually, for my personal projects. I have been writing a library called "assume" where you can specify a type signature, give it a prompt, and it generates a function on the fly in the background with Claude Code, so you still write some code, but whenever you need a function you "assume" that such a function exists. I have a Java version that works right now and I will likely be pushing it within the next week.

But more generally, I actually have been building some CI stuff to automate how I'm saying.

I don't have much of a say how this is handled at work so they're just committing the generated code, but I actually am doing what I am talking about.

	▲	ben-schaaf 34 minutes ago \| parent [-]
		Sounds like a fun project. And are you committing code for this library? Because it sounds like you are, and if that's the case I don't think you're actually doing what you're talking about.