Remix.run Logo
cadamsdotcom 3 hours ago

This is easily solved with good error messages.

Claude always gets the syntax wrong on my tool calls.

So I did a revolutionary thing and made the error output print helpful guidance on how to correctly call the tool.

The agent tries again and always gets it right. Total time “wasted”: 1-2 seconds. It happens every session, but it only happens once per context window. After that the agent holds on to the lesson.

To do this for your own tool calls, imagine what you’d do in the agent’s place - what info you’d need so you can correct your mistake. Assume the agent wants to achieve the goal so it’ll try again. These are probabilistic systems, so we need to give them an extra loop to get the deterministic bits right.

psadri 2 minutes ago | parent | next [-]

[delayed]

siwatanejo 2 hours ago | parent | prev | next [-]

So, are you saying that skills are not such a good tool for agents to learn, they still need tool-trial-and-error dance after injecting them? (I'm assuming each tool comes with its own skill.)

sdesol an hour ago | parent | next [-]

> they still need tool-trial-and-error dance after injecting them?

It honestly depends on the model. For my pi-brains extension for pi

https://github.com/gitsense/pi-brains

I've found after the first hook injection they get it, but there are occasions it can forget, but since everything is driven by hooks, you can inject as often as needed.

The issue with skills is, they are a one time thing, so you really can't use skills to correct haviorial issues.

cadamsdotcom an hour ago | parent | prev [-]

I do not need to waste tokens on skills, I use Claude Code hooks.

Have a look at the TDD guard at https://codeleash.dev - the scripts/tdd_log.py arguments are pretty specific but it also has guidance in CLAUDE.md and lots of helpful error messages.

esafak 2 hours ago | parent | prev [-]

LSPs and linters serve the same purpose. I use the latter in git hooks.