Remix.run Logo
giancarlostoro 3 days ago

Claude or whatever agent will get a message when it tries to close a task, which tells them which gates are not resolved yet, at which point, the agent will instinctively want to read the task. I did run into an issue where I forgot to add gates to a new project, so Claude did smoosh over by making a blanket gate, I have otherwise never had an issue when I defined what the gate is, Claude usually honors it. I havent worked on big updates recently, but I noticed other tools like rtk (Rust Token Killer) will add their own instructions to your claude's instructions.md file, so I think I need to craft one to tack on with sane instructions, including never closing tasks without having the user create gates for them first.

In a nutshell, a gate is a entry in the DB with arbitrary text, Claude is good about following whatever it is. Claude trying to close a task will force it to read it.

Life's gotten slightly busy, but you can see more on the repo. I've been debating giving it a better name, I feel like GuardRails implies security, when the goal is just to validate work slightly.

https://github.com/Giancarlos/GuardRails

skybrian 3 days ago | parent | next [-]

It sounds like a gate is a prompt that shows up at the appropriate time, which works because LLM’s pay more attention to the last thing they read.

It seems like a lot of coding agent features work that way?

giancarlostoro 3 days ago | parent [-]

I suppose, I mean the LLM is still reading it, the issue is, Beads gives the model a task, and then the model finishes, and never checks anything. I kept running into this repeatedly, and sometimes I'd go to compile the project after it said "hey I finished" it wouldn't compile at all, where if it would have just tried to build the project, it would have just worked.

0x457 a day ago | parent [-]

From my understanding the way Gas Town uses beads is that it's not only "what to do" but also contains a workflow.

maleldil 3 days ago | parent | prev [-]

Who closes the gate? Is it Claude itself after it runs the verification? Who makes sure the verification did in fact run?

giancarlostoro 3 days ago | parent [-]

I usually have Claude confirm with me but I've seen it close it if its a unit test that passed for example.

maleldil 2 days ago | parent [-]

You can't trust it 100%. Sometimes it will just refuse to fix a compiler or lint warning (often saying "This was a pre-existing issue...") or write a trivial test that does nothing and always passes.

0x457 a day ago | parent | next [-]

> writes code with a lot of warnings > compacts > "This was a pre-existing issue..."

I still take this over writing code myself though.

maleldil 6 hours ago | parent [-]

I'm not saying you shouldn't. I'd say 70% of my work code is written by Claude Code or Codex. But this is something you should be aware of when interacting with agents.

giancarlostoro 2 days ago | parent | prev [-]

Point being that there are multiple gates to one story, including human testing as one of them.