This article if for those who already made up their mind that "spec-based-development" isn't for them.

I believe (and practice) that spec-based development is one of the future methodologis for developing projects with LLMs. At least it will be one of the niches.

Author thinks about specs as waterfalls. I think about them as a context entrypoint for LLMs. Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc) LLM will be able to build sufficient initial context for the solution to expand it by reading files and grepping text. And the most interesting is that you can make LLM to keep the context/spec/projetc file updated each time LLM updates the project. Viola: now you are in agile again: just keep iterating on the context/spec/project

▲

survirtual 8 hours ago | parent | next [-]

This is the key, with test driven dev sprinkled in.

You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.

I model this as "grounding". Just like you need to ground an electrical system, you need to ground the LLM to reality. The tests do this, so they are REQUIRED for all LLM coding.

Once a framework is established, you require tests for everything. No code is written without tests. These can also be perf tests. They need solid metrics in order to output quality.

The tests provide context and documentation for future LLM runs.

This is also the same way I'd handle foreign teams, that at no fault of their own, would often output subpar code. It was mainly because of a lack of cultural context, communication misunderstandings, and no solid metrics to measure against.

Our main job with LLMs now as software engineers is a strange sort of manager, with a mix of solutions architect, QA director, and patterns expertise. It is actually a lot of work and requires a lot of human people to manage, but the results are real.

I have been experimenting with how meta I can get with this, and the results have been exciting. At one point, I had well over 10 agents working on the same project in parallel, following several design patterns, and they worked so fast I could no longer follow the code. But with layers of tests, layers of agents auditing each other, and isolated domains with well defined interfaces (just as I would expect in a large scale project with multiple human teams), the results speak for themselves.

I write all this to encourage people to take a different approach. Treat the LLMs like they are junior devs or a foreign team speaking a different language. Remember all the design patterns used to get effective use out of people regardless of these barriers. Use them with the LLMs. It works.

▲

layer8 20 minutes ago | parent | next [-]

> Once specs are captured as tests, the LLM can no longer hallucinate.

Tests are not a correctness proof. I can’t trust LLMs to correctly reason about their code, and tests are merely a sanity check, they can’t verify that the code was correctly reasoned.

▲

techpression 6 hours ago | parent | prev | next [-]

> You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.

Except when it decides to remove all the tests, change their meaning to make them pass or write something not in the spec. Hallucinations are not a problem of the input given, it’s in the foundations of LLMs and so far nobody have solved it. Thinking it won’t happen can and will have really bad outcomes.

▲

CuriouslyC 36 minutes ago | parent | next [-]

You can solve this easily by having a separate agent write the tests, and not giving the implementing agent write permission on test files.

▲

survirtual 6 hours ago | parent | prev [-]

It doesn't matter because use of version control is mandatory. When you see things missing or bypassed, audit-instructed LLMs detect these issues and roll-back changes.

I like to keep domains with their own isolated workspaces and git repos. I am not there yet, but I plan on making a sort of local-first gitflow where agents have to pull the codebase, make a new branch, make changes, and submit pull requests to the main codebase.

I would ultimately like to make this a oneliner for agents, where new agents are sandboxed with specific tools and permissions cloning the main codebase.

Fresh-context agents then can function as code reviewers, with escalation to higher tier agents (higher tier = higher token count = more expensive to run) as needed.

In my experience, with correct prompting, LLMs will self-correct when exposed to auditors.

If mistakes do make it through, it is all version controlled, so rolling back isn't hard.

	▲	CuriouslyC 34 minutes ago \| parent [-]
		This is the right flow. As agents get better, work will move from devs orchestrating in ides/tuis to reactive, event driven orchestration surfaced in VCS with developers on the loop. It cuts out the middleman and lets teams collaboratively orchestrate and steer.

▲

skydhash 5 hours ago | parent | prev [-]

But do you understand the problem and its context well enough to write tests for the solution?

Take prolog and logic programming. It's all about describing the problem and its context and let the solver find the solution. Try writing your specs in pseudo-prolog code and you will be surprised with all the missing information you're leaving up to chance.

	▲	survirtual 3 minutes ago \| parent [-]
		I am not writing the tests, LLMs are. My objective is to write prompts for LLMs that can write prompts for LLMs that can write code. When there is a problem downstream the descendant hierarchy, it is a failure of parent LLM's prompts, so I correct it at the highest level and allow it to trickle down. This eventually resolves into a stable configuration with domain expertise towards whatever function I require, in whatever language is best suited for the task. If I have to write tests manually, I have already failed. It doesn't matter how skilled I am at coding or capable I am at testing. It is irrelevant. Everything that can be automated should be automated, because it is a force amplifier.

▲

makeitdouble 9 hours ago | parent | prev | next [-]

> Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc)

What's not waterfall about this is lost on me.

Sounds to me like you're arguing waterfall is fine if each full run is fast/cheap enough, which could happen with LLMs and simple enough projects. [0]

Agile was offering incremental spec production , which had the tremendous advantage of accumulating knowledge incrementally as well. It might not be a good fit for LLMs, but revising the definition to make it fit doesn't help IMHO.

[0] Reminds me that reducing the project scopes to smaller runs was also a well established way to make waterfall bearable.

▲

dtech 8 hours ago | parent | next [-]

Waterfall with short iteration time is not possible by definition.

You might as well say agile is still waterfall, what are sprints if not waterfall with a 2 week iteration time. And Kanbal is just a collection of indepent waterfalls... It's not a useful definition of waterfall.

	▲	makeitdouble 5 hours ago \| parent \| next [-]
		Just as most agile projects aren't Agile, most waterfall projects weren't strict Waterfall as it was preached. That being said, when for instance you had a project that should take 2 years and involve a dozen team, you'd try to cut it in 3 or 4 phases, to even if it would only be "released" and fully tested at the end of it all. At least if your goal was to have it see the light in a reasonable time frame. Where I worked we also did integration runs at given checkpoints to be able to iron out issues earlier in the process. PS: on agile, the main specificity I'm seeing is the ability to infinitely extend a project as the scope and specs are typically set on the go. Which is a feature if you're a contractor for a project. you can't do that with waterfall. Most shops have a mix of pre-planning and on-the go specing to get a realistic process.
	▲	RealityVoid 5 hours ago \| parent \| prev [-]
		> Waterfall with short iteration time is not possible by definition. What definition would that be? Regardless, at this point it's all semantics. What I care about is how you do stuff, not the label you assign and in my book writing specs to ground the LLM is a good idea. And I don't even like specs, but in this instance, it works.

▲

podgorniy 8 hours ago | parent | prev | next [-]

> What's not waterfall about this is lost on me.

Exactly. There is a spec, but there is no waterfall required to work and maintain it. Author from the article dismissed spec-based development exactly because they saw resemblance with waterfall. But waterfall isn't required for spec-centric development.

▲

laserlight 8 hours ago | parent [-]

> There is a spec, but there is no waterfall required to work and maintain it.

The problem with waterfall is not that you have to maintain the spec, but that a spec is the wrong way to build a solution. So, it doesn't matter if the spec is written by humans or by LLMs.

I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself. A simple readme or a design document, which already should exist for humans, should be enough.

▲

wiseowise 7 hours ago | parent [-]

> I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself.

“I don’t see the point of maintaining a documentation for developers. They should be able to grep and understand the code itself”

“I don’t see the point of maintaining tests for developers. They should be able to grep and understand the code itself”

“I don’t see the point of compilers/linters for developers. They should be able to grep and find issues themselves”

	▲	skydhash 7 hours ago \| parent \| next [-]
		The thing is that the parallels you are drawing is for things that is very explicitly not the source of the code, but exists alongside it. Code is the ultimate truth. Documentation is a more humane way to describe it. Tests are there to ensure that what is there is what we want. And linters are there to warn us of specific errors. None of these create code. To go from spec to code requires a lot of decisions (each introducing technical debt). Automating the process remove control over those decisions and over the ultimate truth that is the code. But why can't the LLM retains the trace of the decisions so that it presents control point to alter the results. Instead, it's always a rewrite from scratch.
	▲	laserlight 5 hours ago \| parent \| prev [-]
		> “I don’t see the point of maintaining a documentation for developers. They should be able to grep and understand the code itself” I cannot think that this comment is done in good faith, when I clearly wrote above that documentation should already exist for humans: > A simple readme or a design document, which already should exist for humans, should be enough.

▲

midnitewarrior 8 hours ago | parent | prev [-]

I see rapid, iterative Waterfall.

The downfall of Waterfall is that there are too many unproven assumptions in too long of a design cycle. You don't get to find out where you were wrong until testing.

If you break a waterfall project into multiple, smaller, iterative Waterfall processes (a sprint-like iteration), and limit the scope of each, you start to realize some of the benefits of Agile while providing a rich context for directing LLM use during development.

Comparing this to agile is missing the point a bit. The goal isn't to replace agile, it's to find a way that brings context and structure to vibe coding to keep the LLM focused.

▲

Fargren 8 hours ago | parent [-]

"rapid, iterative Waterfall" is a contradiction. Waterfall means only one iteration. If you change the spec after implementation has started, then it's not waterfall. You can't change the requirements, you can't iterate.

Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.

	▲	Jtsummers 2 hours ago \| parent [-]
		> Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate. If only this were accurate. Royce's chart (at the beginning of the paper, what became Waterfall, but not what he recommended by the end of the paper) has been adopted by the DOD. They're slowly moving away from it, but it's used on many real-world projects and fails about as spectacularly as you'd expect. If projects deliver on-time, it's because they blow up their budget and have people work long days and weekends for months or years at a time. If it delivers on budget, it's because they deliver late or cut out features. Either way, the pretty plan put into the presentations is not met. People really do (and did) think that the chart Royce started with was a good idea, they're not competent, but somehow they got into positions in management to force this stupidity.

▲

danielbln 9 hours ago | parent | prev | next [-]

I would maybe argue that there is a sweet spot of how much you feed in (with some variability depending on task). I tend to keep my initial instructions succinct, then build them up iteratively. Others write small novels of instructions before they start, which personally don't like as much. I don't always know what I don't know, so speccing ahead in great detail can sometimes be detrimental.

	▲	podgorniy 9 hours ago \| parent [-]
		Agree. I don't use term "spec" as it was with "spec-based development" before llms. There details were required to be defined upfront. With LLMs you can start with vague spec, missing some sections and clarify it with iterations. Sweet spot will be a moving target. LLMs build-in assumptions, ways to expand concepts will be chaning with LLMs development. So best practices will change with change of the LLMs capabilities. The same set of instructions, not too detailed, were so much better handled by sonnet 4 than sonnet 3 in my experience. Sonnet 3.5 was for me a breaking point which showed that context-based llm development is a feasible strategy.

▲

adam1996TL 8 hours ago | parent | prev | next [-]

You're right that this is the future, but I believe the thread is misdiagnosing the core 'system error'.

The frustration thomascountz describes (tweaking, refining, reshaping) isn't a failure of methodology (SDD vs. Iteration). It's 'cognitive overload' from applying a deterministic mental model to a probabilistic system.

With traditional code, the 'spec' is a blueprint for logic. With an LLM, the 'spec' is a protocol for alignment.

The 'bug' is no longer a logical flaw. It's a statistical deviation. We are no longer debugging the code; we are debugging the spec itself. The LLM is the system executing that spec.

This requires a fundamental shift in our own 'mental OS'—from 'software engineer' to 'cognitive systems architect'.

	▲	skydhash 6 hours ago \| parent \| next [-]
		I know enough about Machine Learning and statistics to understand that errors are always there. It just needs to be small enough to not matter in the decisions that need to be taken (hopefully). But the thing is that computers can't differentiate errors from correct behavior. Anything in the code is true and if the result is catastrophic, so be it. As software engineers, it's very often easy to specify what the system should do. But ensuring that it doesn't do what he shouldn't do is the tiresome part of the job. And most tools we created is to ensure the latter.
	▲	podgorniy 8 hours ago \| parent \| prev [-]
		I could not have said it better. We're on the same page with you. I would add that to my opinion if previously code production/management was a limiting factor in software development, today it's not. The conceptualisation (onthology, methodology) of the framework (spec-centric devlopment) for the system production and maintenance (code, artifacts, running system) becomes a new limiting factor. But it's matter of time we'll figure out 2-3 methodologies (like it happened with the agile's scrum/kanban) which will become a new "baseline". We're at the early stages when new "laws of llm development" (as in "laws of physics") is still being figured out.

▲

eric-burel 9 hours ago | parent | prev | next [-]

I would simply replace LLM by agent in your reasoning, in the sense that you'll need a strong preprocessing step and multiple iterations to exploit such complete specs.

	▲	podgorniy 8 hours ago \| parent [-]
		There is sense in your words. Especially in the context of the modern day vocabulary. I though about the concept of this ort of methodology before "agent" (which I would define as "sideeffects with LLM integration") was marketed into community vocabulary. And I'm still rigidly sticking to what I consider "basics". Hope that does not impede understanding.

▲

RealityVoid 9 hours ago | parent | prev [-]

I had a small embedded project and I did it > 70% using LLM's. This is exactly how I did it. Specs are great for grounding the LLM. Coding with LLM's is going to mean relying more on process since you can't fully trust them. It means writing specs, writing small models to validate, writing tests and a lot of code review to understand what the heck it's doing.