> long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info)

I think spec-driven generation is the antithesis of chat-style coding for this reason. With tools like Claude Code, you are the one tracking what was already built, what interfaces exist, and why something was generated a certain way.

I built Ossature[1] around the opposite model. You write specs describing behavior, it audits them for gaps and contradictions before any code is written, then produces a build plan toml where each task declares exactly which spec sections and upstream files it needs. The LLM never sees more than that, and there is no accumulated conversation history to drift from. Every prompt and response is saved to disk, so traceability is built in rather than something you reconstruct by scrolling back through a chat. I used it over the last couple of days to build a CHIP-8 emulator entirely from specs[2]. I have some more example projects on GitHub[3]

1: https://github.com/ossature/ossature

2: https://github.com/beshrkayali/chomp8

3: https://github.com/ossature/ossature-examples

▲

comboy 9 hours ago | parent | next [-]

Hey, you seem to have similar view on this. I know ideas are cheap but hear me out:

You talk with agent A it only modifies this spec, you still chat and can say "make it prettier" but that agent only modifies the spec, the spec could also separate "explicit" from "inferred".

And of course agent B which builds only sees the spec.

User actually can care about diffs generated by agent A again, because nobody wants to verify diffs on agents generated code full of repetition and created by search and replace. I believe if somebody implements this right it will be the way things are done.

And of course with better models spec can be used to actually meaningfully improve the product.

Long story short what industry misses currently and what you seem to be understanding is that intent is sacred. It should be always stored, preferably verbatim and always with relevant context ("yes exactly" is obviously not enough). Current generation of LLMs can already handle all that. It would mean like 2-3x cost but seem so much worth it (and the cost on the long run could likely go below 1x given typical workflows and repetitions)

▲

beshrkayali 8 hours ago | parent | next [-]

Right, the spec/build separation is exactly the idea and Ossature is already built that way on the build side.

I agree a dedicated layer for intent capture makes a lot of sense. I thought about that as well, I am just not fully convinced it has to be conversational (or free-form conversational). Writing a prompt to get the right spec change is still a skill in itself, and it feels like it'd just be shifting the problem upstream rather than actually solving it. A structured editing experience over specs feels like it'd be more tractable to me. But the explicit vs inferred distinction you mention is interesting and worth thinking through more.

▲

comboy 7 hours ago | parent | next [-]

The spec manually crafted the user is ideal.

It's just that we're lazy. After being able to chat, I don't see people going back. You can't just paste some error into the specs, you can't paste it image and say it make it look more like this. Plus however well designed the spec, something like "actually make it always wait for the user feedback" can trigger changes in many places (even for the sake of removing contradictions).

	▲	ithkuil 5 hours ago \| parent [-]
		The spec can be wrong for many reasons: 1. You can write a spec that builds something that is not what you actually wanted 2. You can write spec that is incoherent with itself or with the external world 3. You can write a spec that doesn't have sufficient mechanical sympathy with the tooling you have and so it requires you to all spec out more and more of the surrounding tech than you practically can. All of those issues can be addressed by iterating on the spec with the help of agents. It's just an engineering practice, one that we have to become better at understanding

▲

4b11b4 3 hours ago | parent | prev [-]

▲

4b11b4 3 hours ago | parent | prev | next [-]

yep but spec isn't the root

▲

viktorianer 7 hours ago | parent | prev [-]

[dead]

▲

hansonkd 5 hours ago | parent | prev | next [-]

I've been thinking a lot about this lately. It seems like what is missing with most coding agents is a central source of truth. Before the truth of what the company was building and alignment was distributed, people had context about what they did and what others did and are doing.

Now the coding agent starts fresh each time and its up to you to understand what you asked it and provide the feedback loop.

Instead of chat -> code, I think chat -> spec and then spec -> code is much more the future.

the spec -> code phase should be independent from any human. If the spec is unclear, ask the human to clarify the spec, then use the spec to generate the code.

What happens today is that something is unclear and there is a loop where the agent starts to uncover some broader understanding, but then it is lost the next chat. And then the Human also doesn't learn why their request was unclear. "Memories" and Agents files are all ducktape to this problem.

▲

xrd 2 hours ago | parent | prev | next [-]

This is really fascinating and lines up with my way of development.

I notice you support ollama. Have you found it effective with any local models? Gemma 4?

I'm definitely going to play with this.

▲

alfiedotwtf 19 minutes ago | parent | prev | next [-]

How does this differ from Superpowers?

▲

Yokohiii 10 hours ago | parent | prev | next [-]

I like it a lot, I find the chat driven workflow very tiring and a lot of information gets lost in translation until LLMs just refuse to be useful.

How does the human intervention work out? Do you use a mix of spec and audit editing to get into the ready to generate state? How high is the success/error rate if you generate from tasks to code, do LLMs forget/mess up things or does it feel better?

The spec driven approach is potentially better for writing things from scratch, do you have any plans for existing code?

	▲	beshrkayali 9 hours ago \| parent [-]
		Thanks! > How does the human intervention work out? Do you use a mix of spec and audit editing to get into the ready to generate state? Yes, the flow is: you write specs then you validate them with `ossature validate` which parses them and checks they are structurally sound (no LLM involved), then you run `ossature audit` which flags gaps or contradictions in the content as INFO, WARNING, or ERROR level findings. The audit has its own fixer loop that auto-resolves ERROR level findings, but you can also run it interactively, manually fix things yourself, address the INFO and WARNING findings as you see fit, and rerun until you are happy. From that it produces a toml build plan that you can read and edit directly before anything is generated. You can reorder tasks, add notes for the LLM, adjust verification commands, or skip steps entirely. So when you run `ossature build` to generate, the structure is already something you have signed off on. There's a bit more details under the hood, I wrote more in an intro post[1] about Ossature, might be useful. > The spec driven approach is potentially better for writing things from scratch, do you have any plans for existing code? Right now it is best for greenfield, as you said. I have been thinking about a workflow where you generate specs from existing code and then let Ossature work from those, but I am honestly not sure that is the right model either. The harder case is when engineers want to touch both the code and the specs, and keeping those in sync through that back and forth is something I want to support but have not figured out a clean answer for yet. It's on the list, if you have any thoughts please feel free to open an issue! I want to get through some of the issues I am seeing with just spec editing workflow (and re-audit/re-planning) first, specifically around how changes cascade through dependent tasks. Regarding success rate, each task requires a verification command to run and pass after generation and if it fails, a separate fixer agent tries to repair it using the error output. The number of retry attempts is configurable. I did notice that the more concise and clear the spec is the more likely it is for capable models to generate code that works (obviously) but that's what auditing is supposed to help with. One interesting case about the chip-8 emulator I mentioned above is that even mentioning the correct name of the solution to a specific problem was not enough, I had to spell out the concrete algorithm in the spec (wrote more details here[2]). But the full prompt and response for every task is saved to disk, so when something does go wrong one can read the exact prompt/response and fix-attempts prompt/response for each task. 1: https://ossature.dev/blog/introducing-ossature/ 2: https://log.beshr.com/chip8-emulator-from-spec/

▲

gburgett 3 hours ago | parent | prev | next [-]

Totally agreed! Ive had good success using claude code with Cucumber, where I start with the spec and have claude iterate on the code. How does ossature compare to that approach?

▲

peterm4 10 hours ago | parent | prev | next [-]

This looks great, and I’ve bookmarked to give it a go.

Any reason you’ve opted for custom markdown formats with the @ syntax rather than using something like frontmatter?

Very conscious that this would prevent any markdown rendering in github etc.

▲

beshrkayali 9 hours ago | parent [-]

I've answered this exact question in a previous hn comment thread a few weeks ago, maybe I should reconsider front-matter? My previous answer:

> Yeah, I did briefly consider front-matter, but ended up with inline @ tags because I thought it kept the entire document feeling like one coherent spec instead of header-data + body, front matter felt like config to me, but this is 0.0.1 so things might change :)

	▲	4b11b4 3 hours ago \| parent [-]
		need both

▲

4b11b4 3 hours ago | parent | prev | next [-]

nice but can't be only text based

▲

3 hours ago | parent | prev | next [-]

[deleted]

▲

straydusk 4 hours ago | parent | prev | next [-]

This is basically what Augment Intent is

▲

dboreham 9 hours ago | parent | prev [-]

Waterfall!

	▲	AnimalMuppet 7 hours ago \| parent [-]
		There are two problems with waterfall. First, if it takes too long to implement, the world moved on and your spec didn't move. Second, there are often gaps in the spec, and you don't discover them until you try to implement it and discover that the spec doesn't specify enough. Well, for the first problem, if an AI can generate the code in a day or a week, the world hasn't moved very much in that time. (In the future, if everything is moving at the speed of AI, that may no longer be true. For now it is.) The second problem... if Ossature (or equivalent) warns you of gaps rather than just making stuff up, you could wind up with iterative development of the spec, with the backend code generation being the equivalent of a compiler pass. But at that point, I'm not sure it's fair to call it "waterfall". It's iterative development of the spec, but the spec is all there is - it's the "source code".