Remix.run Logo
joegaebel 7 hours ago

In my view, Spec-Driven systems are doomed to fail. There's nothing that couples the english language specs you've written with the actual code and behaviour of the system - unless your agent is being insanely diligent and constantly checking if the entire system aligns with your specs.

This has been solved already - automated testing. They encode behaviour of the system into executables which actually tell you if your system aligns or not.

Better to encode the behaviour of your system into real, executable, scalable specs (aka automated tests), otherwise your app's behaviour is going to spiral out of control after the Nth AI generated feature.

The way to ensure this actually scales with the firepower that LLMs have for writing implementation is ensure it follows a workflow where it knows how to test, it writes the tests first, and ensures that the tests actually reflect the behaviour of the system with mutation testing.

I've scoped this out here [1] and here [2].

[1] https://www.joegaebel.com/articles/principled-agentic-softwa... [2] https://github.com/JoeGaebel/outside-in-tdd-starter

oakpond 2 hours ago | parent | next [-]

Sort of agreed. Natural language specs don't scale. They can't be used to accurately model and verify the behavior of complex systems. But they can be used as a guide to create formal language specs that can be used for that purpose. As long as the formal spec is considered to be the ground truth, I think it can scale. But yeah, that means some kind of code will be required.. :)

zby 7 hours ago | parent | prev | next [-]

Spec Driven Development is a curious term - it suggests it is a kind of, or at least in the tradition of, Test Driven Development but it goes in the opposite direction!

sveme 6 hours ago | parent [-]

Don't understand this - you can go spec -> test -> implementation and establish the test loop. Bit like the v model of old, actually.

joegaebel 3 hours ago | parent | next [-]

In my view, the problem with specs are:

1. Specs are subject to bit-rot, there's no impetus to update them as behaviour changes - unless your agent workflow explicitly enforces a thorough review and update of the specs, and unless your agent is diligent with following it. Lots of trust required on your LLM here.

2. There's no way to systematically determine if the behaviour of your system matches the specs. Imagine a reasonable sized codebase - if there's a spec document for every feature, you're looking at quite a collection of specs. How many tokens need be burnt to ensure that these specs are always up to date as new features come in and behaviour changes?

3. Specs are written in English. They're ambiguous - they can absolutely serve the planning and design phases, but this ambiguity prevents meaningful behaviour assertions about the system as it grows.

Contrast that with tests:

1. They are executable and have the precision of code. They don't just describe behaviour of the system, they validate that the system follows that behaviour, without ambiguity.

2. They scale - it's completely reasonable to have extensive codebases have all (if not most) of their behaviour covered by tests.

3. Updating is enforcable - assuming you're using a CI pipeline, when tests break, they must be updated in order to continue.

4. You can systematically determine if the tests fully describe the behaviour (ie. is all the behaviour tested) via mutation testing. This will tell you with absolute certainty if code is tested or not - do the tests fully describe the system's behaviour.

That being said, I think it's very valuable to start with a planning stage, even to provide a spec, such that the correct behaviour gets encoded into tests, and then instantiated by the implementation. But in my view, specs are best used within the design stage, and if left in the codebase, treated only as historical info for what went into the development of the feature. Attempting to use them as the source of truth for the behaviour of the system is fraught.

And I guess finally, I think that insofar as any framework uses the specs as the source of truth for behaviour, they're going to run into alignment problems since maintaining specs doesn't scale.

zby 5 hours ago | parent | prev [-]

SDD is about flowing the design choices from the spec into the rest of the system. TDD was for making sure that the inevitable changes you make to the system later don't break your earlier assumptions - or at least warn that you need to change them. Personally I don't buy TDD - it might be useful sometimes - but it is kind of extreme - but in general agile methodologies were a reaction to the waterfall model of system development.

internet_points 3 hours ago | parent | prev | next [-]

See also recent post "A sufficiently detailed spec is code" which tried and failed to reproduce openai's spec results: https://hn.algolia.com/?q=https%3A%2F%2Fhaskellforall.com%2F...

j45 7 hours ago | parent | prev | next [-]

Specs see more about alignment and clarity increasing code that works, and increase the success of tests.

locknitpicker 7 hours ago | parent | prev [-]

> This has been solved already - automated testing.

This is specious reasoning. Automated tests are already the output of these specs, and specs cover way more than what you cover with code.

Framing tests as the feedback that drives design is also a baffling opinion. Without specialized prompts such as specs, you LLM agent of choice ends up either ignoring tests altogether or even changing them to fit their own baseless assumptions.

I mean, who hasn't stumbled upon the infamous "the rest of your tests go here" output in automated tests?

polytely 5 hours ago | parent | next [-]

> Automated tests are already the output of these specs, and specs cover way more than what you cover with code.

ok but how are you sure that the AI is correctly turning the spec into tests. if it makes a mistake there and then builds the code in accordance with the mistaken test you only get the Illusion of a correct implementation

locknitpicker 4 hours ago | parent [-]

> ok but how are you sure that the AI is correctly turning the spec into tests.

You use the specs to generate the tests, and you review the changes.

mattmanser 6 hours ago | parent | prev [-]

I've seen a few comments recently that start with:

This is specious reasoning

It's an insulting phrase and from now on I'm immediately down voting it when I see it.

nelox 5 hours ago | parent | next [-]

On the face of it is insulting, until you dig a little deeper

locknitpicker 4 hours ago | parent | prev [-]

> It's an insulting phrase ( ...)

I'm sorry you feel like that. How would you phrase an observation where you find the rationale for an assertion to not be substantiated and supported beyond surface level?