Remix.run Logo
pipes 5 hours ago

I'll probably be proven wrong eventually, but my main thought about spec driven dev with llms is that it introduces an unreliable compiler. It will produced different results every time it is run and it's up to the developer to review the changes. Which just seems like a laborious error prone task.

CuriouslyC 42 minutes ago | parent | next [-]

No, this is the right take. Spec driven development is good, but having loose markdown "specs" that leave a bunch up to the discretion of the LLM is bad. The right approach is a project spec DSL that agents write, which can be compiled via codegen in a more controlled way.

Kiro 5 hours ago | parent | prev | next [-]

Why would you want to rerun it? In that context a human is also an unreliable compiler. Put two humans on the task and you will get two different results. Even putting the same human on the same task again will yield something different. LLMs producing unreliable output that can't be reproduced is definitely a problem but not in this case.

pipes 4 hours ago | parent | next [-]

Might be misunderstanding the workflow here, but I think if a change request comes and I alter the spec, I'd need to re run the llm bit that generates the code?

kannanvijayan 2 hours ago | parent | next [-]

You'd want to have the alteration reference existing guides to the current implementation.

I haven't jumped in headfirst to the "AI revolution", but I have been systematically evaluating the tooling against various use cases.

The approach that tends to have the best result for me combines a collection of `RFI` (request for implementation) markdown documents to describe the work to be done, as well as "guide" documents.

The guide documents need to keep getting updated as the code changes. I do this manually but probably the more enthusiastic AI workflow users would make this an automated part of their AI workflow.

It's important to keep the guides brief. If they get too long they eat context for no good reason. When LLMs write for humans, they tend to be very descriptive. When generating the guide documents, I always add an instruction to tell the LLM to "be succinct and terse", followed by "don't be verbose". This makes the guides into valuable high-density context documents.

The RFIs are then used in a process. For complex problems, I first get the LLM to generate a design doc, then an implementation plan from that design doc, then finally I ask it to implement it while referencing the RFI, design doc, impl doc, and relevant guide docs as context.

If you're altering the spec, you wouldn't ask it to regen from scratch, but use the guide documents to compute the changes needed to implement the alteration.

I'm using claude code primarily.

Kiro 4 hours ago | parent | prev | next [-]

Hm, maybe it's me who misunderstands the workflow. In that case I agree with you.

That said, I think the non-determinism when rerunning a coding task is actually pretty useful when you're trying to brainstorm solutions. I quite often rerun the same prompt multiple times (with slight modifications or using different models) and write down the implementation details that I like before writing the final prompt. When I'm not happy with the throwaway solutions at all I reconsider the overall specification.

However, the same non-determinism has also made me "lose" a solution that I threw out and where the real prompt actually performed worse. So nowadays I try to make it a habit to stash the throwaway solutions just in case. There's probably something in Cursor where you can dig out things you backtracked on but I'm not a power user.

sidpatil 4 hours ago | parent | prev [-]

You would need to rerun the LLM, but you wouldn't necessarily need to rebuild the codebase from scratch.

You can provide the existing spec, the new spec, and the existing codebase all as context, then have the LLM modify the codebase according to the updates to the spec.

pydry 3 hours ago | parent | prev [-]

Humans are unreliable compilers but good devs are able to "think outside of the box" in terms of using creative ways to protect against their human foibles while LLMs cant.

If I get a nonsensical requirement i push back. If i see some risky code i will think of some way to make it less risky.

mexicocitinluez 4 hours ago | parent | prev [-]

You don't need this type of work to be deterministic. It doesn't really matter if the LLM names a function "IsEven" vs "IsNumberEvent".

Have you ever written the EXACT same code twice?

> it introduces an unreliable compiler.

So then by definition so our humans. If compiling is "taking text and converting it to code" that's literally us.

> it's up to the developer to review the changes. Which just seems like a laborious error prone task.

There are trade-offs to everything. Have you ever worked with an off-shore team? They tend to produce worse code and have 1% of the context the LLM does. I'd much rather review LLM-written code than "I'm not even the person you hired because we're scamming the system" developers.

tommy_axle 4 hours ago | parent [-]

You want it to be as close to deterministic as possible to reduce the risk of the LLM doing something crazy like deleting a feature or functionality. Sure, the idea is for reviews to catch it but it's easier to miss there when there is a lot of noise. I agree that it's very similar to an offshore team that's just focused on cranking out code versus caring about what it does.