| ▲ | joegaebel 4 hours ago | |
In my view, the problem with specs are: 1. Specs are subject to bit-rot, there's no impetus to update them as behaviour changes - unless your agent workflow explicitly enforces a thorough review and update of the specs, and unless your agent is diligent with following it. Lots of trust required on your LLM here. 2. There's no way to systematically determine if the behaviour of your system matches the specs. Imagine a reasonable sized codebase - if there's a spec document for every feature, you're looking at quite a collection of specs. How many tokens need be burnt to ensure that these specs are always up to date as new features come in and behaviour changes? 3. Specs are written in English. They're ambiguous - they can absolutely serve the planning and design phases, but this ambiguity prevents meaningful behaviour assertions about the system as it grows. Contrast that with tests: 1. They are executable and have the precision of code. They don't just describe behaviour of the system, they validate that the system follows that behaviour, without ambiguity. 2. They scale - it's completely reasonable to have extensive codebases have all (if not most) of their behaviour covered by tests. 3. Updating is enforcable - assuming you're using a CI pipeline, when tests break, they must be updated in order to continue. 4. You can systematically determine if the tests fully describe the behaviour (ie. is all the behaviour tested) via mutation testing. This will tell you with absolute certainty if code is tested or not - do the tests fully describe the system's behaviour. That being said, I think it's very valuable to start with a planning stage, even to provide a spec, such that the correct behaviour gets encoded into tests, and then instantiated by the implementation. But in my view, specs are best used within the design stage, and if left in the codebase, treated only as historical info for what went into the development of the feature. Attempting to use them as the source of truth for the behaviour of the system is fraught. And I guess finally, I think that insofar as any framework uses the specs as the source of truth for behaviour, they're going to run into alignment problems since maintaining specs doesn't scale. | ||