I vibe coded for months but switched to spec driven development in the last 6 months

I'm also old enough to have started my career learning the rational unified process and then progressed through XP, agile, scrum etc

My process is I spend 2-3 hours writing a "spec" focusing on acceptance criteria and then by the end of the day I have a working, tested next version of a feature that I push to production.

I don't see how using a spec has made me less agile. My iteration takes 8 hours.

However, I see tons of useless specs. A spec is not a prompt. It's an actual definition of how to tell if something is behaving as intended or not.

People are notoriously bad at thinking about correctness in each scenario which is why vibe coding is so big.

People defer thinking about what correct and incorrect actually looks like for a whole wide scope of scenarios and instead choose to discover through trial and error.

I get 20x ROI on well defined, comprehensive, end to end acceptance tests that the AI can run. They fix everything from big picture functionality to minor logic errors.

▲ pipes 5 hours ago | parent | next [-]

I'll probably be proven wrong eventually, but my main thought about spec driven dev with llms is that it introduces an unreliable compiler. It will produced different results every time it is run and it's up to the developer to review the changes. Which just seems like a laborious error prone task.

▲

CuriouslyC 41 minutes ago | parent | next [-]

No, this is the right take. Spec driven development is good, but having loose markdown "specs" that leave a bunch up to the discretion of the LLM is bad. The right approach is a project spec DSL that agents write, which can be compiled via codegen in a more controlled way.

▲

Kiro 5 hours ago | parent | prev | next [-]

Why would you want to rerun it? In that context a human is also an unreliable compiler. Put two humans on the task and you will get two different results. Even putting the same human on the same task again will yield something different. LLMs producing unreliable output that can't be reproduced is definitely a problem but not in this case.

▲

pipes 4 hours ago | parent | next [-]

Might be misunderstanding the workflow here, but I think if a change request comes and I alter the spec, I'd need to re run the llm bit that generates the code?

	▲	kannanvijayan 2 hours ago \| parent \| next [-]
		You'd want to have the alteration reference existing guides to the current implementation. I haven't jumped in headfirst to the "AI revolution", but I have been systematically evaluating the tooling against various use cases. The approach that tends to have the best result for me combines a collection of `RFI` (request for implementation) markdown documents to describe the work to be done, as well as "guide" documents. The guide documents need to keep getting updated as the code changes. I do this manually but probably the more enthusiastic AI workflow users would make this an automated part of their AI workflow. It's important to keep the guides brief. If they get too long they eat context for no good reason. When LLMs write for humans, they tend to be very descriptive. When generating the guide documents, I always add an instruction to tell the LLM to "be succinct and terse", followed by "don't be verbose". This makes the guides into valuable high-density context documents. The RFIs are then used in a process. For complex problems, I first get the LLM to generate a design doc, then an implementation plan from that design doc, then finally I ask it to implement it while referencing the RFI, design doc, impl doc, and relevant guide docs as context. If you're altering the spec, you wouldn't ask it to regen from scratch, but use the guide documents to compute the changes needed to implement the alteration. I'm using claude code primarily.
	▲	Kiro 4 hours ago \| parent \| prev \| next [-]
		Hm, maybe it's me who misunderstands the workflow. In that case I agree with you. That said, I think the non-determinism when rerunning a coding task is actually pretty useful when you're trying to brainstorm solutions. I quite often rerun the same prompt multiple times (with slight modifications or using different models) and write down the implementation details that I like before writing the final prompt. When I'm not happy with the throwaway solutions at all I reconsider the overall specification. However, the same non-determinism has also made me "lose" a solution that I threw out and where the real prompt actually performed worse. So nowadays I try to make it a habit to stash the throwaway solutions just in case. There's probably something in Cursor where you can dig out things you backtracked on but I'm not a power user.
	▲	sidpatil 4 hours ago \| parent \| prev [-]
		You would need to rerun the LLM, but you wouldn't necessarily need to rebuild the codebase from scratch. You can provide the existing spec, the new spec, and the existing codebase all as context, then have the LLM modify the codebase according to the updates to the spec.

▲

pydry 3 hours ago | parent | prev [-]

Humans are unreliable compilers but good devs are able to "think outside of the box" in terms of using creative ways to protect against their human foibles while LLMs cant.

If I get a nonsensical requirement i push back. If i see some risky code i will think of some way to make it less risky.

▲

mexicocitinluez 4 hours ago | parent | prev [-]

You don't need this type of work to be deterministic. It doesn't really matter if the LLM names a function "IsEven" vs "IsNumberEvent".

Have you ever written the EXACT same code twice?

> it introduces an unreliable compiler.

So then by definition so our humans. If compiling is "taking text and converting it to code" that's literally us.

> it's up to the developer to review the changes. Which just seems like a laborious error prone task.

There are trade-offs to everything. Have you ever worked with an off-shore team? They tend to produce worse code and have 1% of the context the LLM does. I'd much rather review LLM-written code than "I'm not even the person you hired because we're scamming the system" developers.

	▲	tommy_axle 4 hours ago \| parent [-]
		You want it to be as close to deterministic as possible to reduce the risk of the LLM doing something crazy like deleting a feature or functionality. Sure, the idea is for reviews to catch it but it's easier to miss there when there is a lot of noise. I agree that it's very similar to an offshore team that's just focused on cranking out code versus caring about what it does.

▲ dakinitribe 8 hours ago | parent | prev | next [-]

Could I see one of your specs as an example?

▲ noosphr 7 hours ago | parent | prev | next [-]

    People defer thinking about what correct and incorrect actually
    looks like for a whole wide scope of scenarios and instead choose
    to discover through trial and error.

LLMs are _still_ terrible at deriving even the simplest of logical entailment. I've had the latest and greatest Claude and GPT derive 'B instead of '(not B) from '(and A (not B)) when 'A and 'B are anything but the simplest of English sentences.

I shudder to think what they decide the correct interpretations of a spec written in prose is.

	▲	layer8 27 minutes ago \| parent \| next [-]
		Lisp quotes are confusing in prose.
	▲	Kiro 5 hours ago \| parent \| prev \| next [-]
		I would love to see a prompt where it fails such a thing. Do you have an example?
	▲	0x696C6961 6 hours ago \| parent \| prev [-]
		Still better than my coworkers ...

▲ mattmanser 7 hours ago | parent | prev | next [-]

Seems like you are all just redefining what spec and waterfall means.

A spec was from a customer where it would detail every feature. They would be huge, but usually lack enough detail or be ambiguous. They would be signed off by the customer and then you'd deliver to the spec.

It would contain months, if not years, worth of work. Then after all this work the end product would not meet the actual customer needs.

A day's work is not a spec. It's a ticket's worth of work, which is agile.

Agile is an iterative process where you deliver small chunks of work and the customer course corrects as regular intervals. Commonly 3/4 week sprints, made up of many tickets that take hours or days, per course correct.

Generally each sprint had a spec, and each ticket had a spec. But it sounds like until now you've just been winging it, with vague definitions per feature. It's very common, especially where the PO or PM are bad at their job. Or the developer is informally acting as PO.

Now you're making specs per ticket, you're just now doing what many development teams already do. You're just bizarrely calling it a new process.

It's like watching someone point at a bicycle and insist it's a rocketship.

▲

hgomersall 6 hours ago | parent [-]

A customer generally provides requirements (the system should do...) which are translated into a spec (the module/function/method should do...). The set of specs map to requirements. Requirements may be derived from or represented by user stories and specs may or may not by developed in an agile way or written down ahead of time. Whether you have or derive requirements and specs is entirely orthogonal to development methodology. People need to get away from the idea that having specs is any more than a formal description of what the code should do.

The approach we take is the specs are developed from the tests and tests exercise the spec point in its entirety. That is, a test and a spec are semantically synonymous within the code base. Any interesting thing we're playing with is using the specs alongside the signatures to have an LLM determine when the spec is incomplete.

▲

throwaway173738 an hour ago | parent [-]

A spec consists of three different kinds of requirements: functional requirements, non-functional requirements, and constraints. It’s supposed to fully describe how the product responds to the context and the desires of stakeholders.

The problem I see a lot with Agile is that people over-focus on functional requirements in the form of user stories. Which in your case would be statements like “X should do…”

	▲	hgomersall 6 minutes ago \| parent [-]
		I don't necessarily disagree, but can you give an example of a non functional requirement that influences the design?

▲ spacecadet 4 hours ago | parent | prev [-]

Same. I fancy myself a decent technical communicator and architect. I write specs which consists of giant lists of acceptance criteria, on my phone, laying in bed...

Kick that over to some agents to bash on, check in and review here and there, maybe a little mix of vibe and careful corrections by me, and it's done!

Usually in less time, but! any time an agent is working on work shit, Im working on my race car... so its a win win win to me. Im still using my brain, no longer slogging through awful "human centered" programming languages, more time my hobbies.

Isn't that the dream?

Now, to crack this research around generative gibber-lang programming... 90% of our generative code problems are related to the programming languages themselves. Intended for humans, optimized for human interaction, speed, and parsing. Let the AIs design, speak, write, and run the code. All I care about is that the program passes my tests and does what I intended. I do not care if it has indents, or other stupid dogmatic aspects of what makes one language equally usable to any other, but no "my programming language is better!", who cares. Loving this era.