Remix.run Logo
thomascountz 9 hours ago

I just tried an experiment using Spec-Kit from GitHub to build a CLI tool. Perhaps the scope of the tool doesn't align itself with Spec-Driven Development, but I found the many many hours—tweaking, asking, correcting, analyzing, adapting, refining, reshaping, etc—before getting to see any code challenging. As would be the case with Waterfall today, the lack of iterative end-to-end feedback is foreign and frustrating to me.

After Claude finally produced a significant amount of code, and after realizing it hadn't built the right thing, I was back to the drawing board to find out what language in the spec had led it astray. Never mind digging through the code at this point; it would be just as good to start again than to try to onboard myself to the 1000s of lines of code it had built... and I suppose the point is to ignore the code as "implementation detail" anyway.

Just to make clear: I love writing code with an LLM, be it for brainstorming, research, or implementation. I often write—and have it output—small markdown notes and plans for it to ground itself. I think I just found this experience with SDD quite heavy-handed and the workflow unwieldy.

ErrantX 8 hours ago | parent | next [-]

I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.

System specs are non trivial for current AI agents. Hand prompting every step is time consuming.

I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.

The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.

Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!

Example prompts for speckit;

(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.

(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.

skydhash 7 hours ago | parent [-]

Just picking on the example:

The plan step is overly focused on the accidental complexity of the project. While the `Specify` part is doing a good job of defining the scope, the `Plan` part is just complicating it. Why? The choice of technology is usually the first step in introducing accidental complexity in a project. Which is why it's often recommended to go with boring technology (so the cost of this technical debt is known). Otherwise go with something that is already used by the company (if it's a side project, do whatever). If you choose to go that route, there's a good chance you're already have good knowledge of those tools and have code samples (and libraries) lying around.

The whole point of code is to be reliable and to help do something that we'd rather not do. Not to exist on its own. Every decision (even little) needs to be connected to a specific need that is tied to the project and the team. It should not be just a receptacle for wishes.

ErrantX 7 hours ago | parent [-]

I wouldn't call that accidental complexity? It's just a set of preferences.

Your last point; feels a bit idealistic. The point of code is to achieve a goal, there are ways to achieve with optimal efficiency in construction but a lot of people call that gold plating.

The setup these prompts leave you with is boring, standard, and something surely I can do in a couple of hours. You might even skeleton it right? The thing is the AI can do it both faster in elapsed time but also, reduces my time to writing two prompts (<2 minutes) and some review 10-15 perhaps?

Also remember this was a simple example; once we get to real business logic efficiencies grow.

skydhash 6 hours ago | parent [-]

It may be a set of preferences for now, but it always grow into a monstrosity when future preferences don't align with current preferences. That's what accidental complexity means. Instead of working on the essential needs (having an admin interface that works well), you will get bogged down with the whims of the platform and technology (breaking changes, bugs,...). It may not be relevant to you if you're planning on abandoning it (switching jobs, side project you no longer care,...).

Something boring and standard is something that keeps going with minimal intervention while getting better each time.

Qworg 3 hours ago | parent | next [-]

Your team's fixed preferences get stored into your .agents.md file so you don't type it over and over.

If you change your preferences, the team refactors.

ErrantX 3 hours ago | parent | prev [-]

I'm going to go out on a limb here and say NextJs with Auth.js is pretty boring technology.

I'm struggling to see what you'd choose to do differently here?

Edit: actually I'll go further and say I'm guiding against accidental complexity. For example Auth.js is really boring technology, but I am annoyed they've deprecated in favour of better Auth - it's not better and it is definitely not boring technology!

galaxyLogic 8 hours ago | parent | prev | next [-]

I think the challenge is how to create a small but evolvable spec.

What LLMs bring to the picture is that "spec" is high-level coding. In normal coding you start by writing small functions then verify that they work. Similarly LLMs should perhaps be given small specs to start with, then add more functions/features to the spec incrementally. Would that work?

thomascountz 8 hours ago | parent [-]

Thanks! With Spec-Kit and Claude Sonnet 4.5, it wanted to design the whole prod-ready CLI up front. It was hard, if not impossible, to try to scope it to just a single feature or POC. This is what I struggled with most.

Were I to try again, I'd do a lot more manual spec writing or even template rewrites. I expected it to work more-or-less out-of-the-box. Maybe it would've for a standard web app using a popular framework.

It was also difficult to know where one "spec" ended and the next began; should I iterate on the existing one or create a new spec? This might be a solved problem in other SDD frameworks besides Spec-Kit, or else I'm just over thinking it!

ares623 9 hours ago | parent | prev [-]

I think the problem is you still care about the craft. You need to let go and let the tide take you.

thomascountz 8 hours ago | parent | next [-]

I respect this take. As I understand it, in SDD, the code is not the source of truth, it's akin to bytecode; an intermediary between the spec and the observable behavior.

nrhrjrjrjtntbt 8 hours ago | parent | prev [-]

The rip tide