Remix.run Logo
javawizard 3 days ago

Love this.

How did you implement gates? Are they simply tasks Claude itself has to confirm it ran, or are they scripts that run to check that the thing in question actually happened, or do they spawn a separate AI agent to check that the thing happened, or what?

giancarlostoro 3 days ago | parent | next [-]

Claude or whatever agent will get a message when it tries to close a task, which tells them which gates are not resolved yet, at which point, the agent will instinctively want to read the task. I did run into an issue where I forgot to add gates to a new project, so Claude did smoosh over by making a blanket gate, I have otherwise never had an issue when I defined what the gate is, Claude usually honors it. I havent worked on big updates recently, but I noticed other tools like rtk (Rust Token Killer) will add their own instructions to your claude's instructions.md file, so I think I need to craft one to tack on with sane instructions, including never closing tasks without having the user create gates for them first.

In a nutshell, a gate is a entry in the DB with arbitrary text, Claude is good about following whatever it is. Claude trying to close a task will force it to read it.

Life's gotten slightly busy, but you can see more on the repo. I've been debating giving it a better name, I feel like GuardRails implies security, when the goal is just to validate work slightly.

https://github.com/Giancarlos/GuardRails

skybrian 3 days ago | parent | next [-]

It sounds like a gate is a prompt that shows up at the appropriate time, which works because LLM’s pay more attention to the last thing they read.

It seems like a lot of coding agent features work that way?

giancarlostoro 2 days ago | parent [-]

I suppose, I mean the LLM is still reading it, the issue is, Beads gives the model a task, and then the model finishes, and never checks anything. I kept running into this repeatedly, and sometimes I'd go to compile the project after it said "hey I finished" it wouldn't compile at all, where if it would have just tried to build the project, it would have just worked.

0x457 a day ago | parent [-]

From my understanding the way Gas Town uses beads is that it's not only "what to do" but also contains a workflow.

maleldil 3 days ago | parent | prev [-]

Who closes the gate? Is it Claude itself after it runs the verification? Who makes sure the verification did in fact run?

giancarlostoro 2 days ago | parent [-]

I usually have Claude confirm with me but I've seen it close it if its a unit test that passed for example.

maleldil 2 days ago | parent [-]

You can't trust it 100%. Sometimes it will just refuse to fix a compiler or lint warning (often saying "This was a pre-existing issue...") or write a trivial test that does nothing and always passes.

0x457 a day ago | parent | next [-]

> writes code with a lot of warnings > compacts > "This was a pre-existing issue..."

I still take this over writing code myself though.

maleldil 3 hours ago | parent [-]

I'm not saying you shouldn't. I'd say 70% of my work code is written by Claude Code or Codex. But this is something you should be aware of when interacting with agents.

giancarlostoro 2 days ago | parent | prev [-]

Point being that there are multiple gates to one story, including human testing as one of them.

wyre 2 days ago | parent | prev [-]

I built something similar with verifiable gates tasks. The agent has a command to mark the task as done and it will run the bash script, if it passes the task closes, if it doesn’t it appends the failure information into the task description for the agents next attempt at the task.