Remix.run Logo
lukebechtel 7 hours ago

Very cool!

Yep, a constantly updated spec is the key. Wrote about this here:

https://lukebechtel.com/blog/vibe-speccing

I've also found it's helpful to have it keep an "experiment log" at the bottom of the original spec, or in another document, which it must update whenever things take "a surprising turn"

anonzzzies 43 minutes ago | parent | next [-]

We found, especially with Opus and recent claude code that it is better/more precise at reading existing code for figuring out what the current status is than reading specs. It seems (for us) it is less precise at 'comprehending' the spec English than it is the code and that will sometimes reflect in wrong assumptions for new tasks which will result in incorrect implementations of those tasks. So we dropped this. Because of caching, it doesn't seem too bad on the tokens either.

ctoth 5 hours ago | parent | prev | next [-]

Honest question: what do you do when your spec has grown to over a megabyte?

Some things I've been doing:

- Move as much actual data into YML as possible.

- Use CEL?

- Ask Claude to rewrite pseudocode in specs into RFC-style constrained language?

How do you sync your spec and code both directions? I have some slash commands that do this but I'm not thrilled with them?

I tend to have to use Gemini for actually juggling the whole spec. Of course it's nice and chunked as much as it can be? but still. There's gonna need to be a whole new way of doing this.

If programming languages can have spooky language at a distance wait until we get into "but paragraph 7, subsection 5 of section G clearly defines asshole as..."

What does a structured language look like when it doesn't need mechanical sympathy? YML + CEL is really powerful and underexplored but it's still just ... not what I'm actually wanting.

lukebechtel 5 hours ago | parent [-]

Sharding or compaction, both possible with LLMs.

Sharding: Make well-named sub-documents for parts of work. LLM will be happy to create these and maintain cross references for you.

Compaction: Ask the LLM to compact parts of the spec, or changelog, which are over specified or redundant.

ctoth 5 hours ago | parent [-]

My question was something like: what is the right representation for program semantics when the consumer is an LLM and the artifact exceeds context limits?

"Make sub-documents with cross-references" is just... recreating the problem of programming languages but worse. Now we have implicit dependencies between prose documents with no tooling to track them, no way to know if a change in document A invalidates assumptions in document B, no refactoring support, no tests for the spec.

To make things specific:

https://github.com/ctoth/polyarray-spec

lukebechtel 5 hours ago | parent [-]

Ah, I see your point more clearly now.

At some level you have to do semantic compression... To your point on non-explicitness -- the dependencies between the specs and sub-specs can be explicit (i.e. file:// links, etc).

But your overall point on assumption invalidation remains... Reminds me of a startup some time ago that was doing "Automated UX Testing" where user personas (i.e. prosumer, avg joe, etc) were created, and Goals/ Implicit UX flows through the UI were described (i.e. "I want to see my dashboard", etc). Then, an LLM could pretend to be each persona, and test each day whether that user type could achieve the goals behind their user flow.

This doesn't fully solve your problem, but it hints at a solution perhaps.

Some of what you're looking for is found by adding strict linter / tests. But your repo looks like something in an entirely different paradigm and I'm curious to dig into it more.

celadin 2 hours ago | parent | prev | next [-]

I'm still sharing this post in the internal org trainings I run for those new to LLMs. Thanks for it - really great overview of the concept!

I saw in your other comment you've made accommodations for the newer generation, and I will confess than in Cursor (with plan mode) I've found an abbreviated form works just as well as the extremely explicit example found in the post.

If you ever had a followup, I imagine it'd be just as well received!

daliusd 5 hours ago | parent | prev [-]

Looks like default OpenCode / Claude Code behavior with Claude models. Why the extra prompt ?

lukebechtel 5 hours ago | parent [-]

Good question!

1. The post was written before this was common :)

2. If using Cursor (as I usually am), this isn't what it always does by default, though you can invoke something like it using "plan" mode. It's default is to keep todo items in a little nice todo list, but that isn't the same thing as a spec.

3. I've found that Claude Code doesn't always do this, for reasons unknown to me.

4. The prompt is completely fungible! It's really just an example of the idea.