Remix.run Logo
jimbo808 a day ago

Claude is pretty good at totally disregarding most of what’s in your CLAUDE.md, so I’m not optimistic. For example a project I work on gives it specific scripts to run when it runs automated tests, because the project is set up in a way that requires some special things to happen before tests will work correctly. I’ve never once seen it actually call those scripts on the first try. It always tries to run them using the typical command that doesn’t work with our setup, and I have to remind it the what correct thing to run is.

losvedir 11 hours ago | parent | next [-]

That's kind of the opposite of what I mean. CLAUDE.md is (ostensibly) always loaded into the context window so it affects everything the model does.

I'm suggesting a POTENTIAL_TOOLS.md file that is not loaded into the context, but which Claude knows the existence of. That file would be an exhaustive list of all the tools you use, but which would be too many tokens to have perpetually in the context.

Finally, Claude would know - while it's planning - to invoke a sub-agent to read that file with a high level idea of what it wants to do, and let the sub-agent identify the subset of relevant tools and return those to the main agent. Since it was the sub-agent that evaluated the huge file, the main agent would only have the handful of relevant tools in its context.

snek_case 20 hours ago | parent | prev | next [-]

I've had a similar experience with Gemini ignoring things I've explicitly told it (sometimes more than once). It's probably context rot. LLM give you a huge advertised number of tokens in the context, but the more stuff you put in there, the less reliably it remembers everything, which makes sense given how transformer attention blocks work internally.

cerved 19 hours ago | parent | prev | next [-]

Claude is pretty good at forgetting to run maven with -am flag, writing bash with heredocs that it's interpreter doesn't weird out on, using the != operator in jq. Maybe Claude has early onset dementia.

vendiddy 16 hours ago | parent [-]

Demented AIs running amock is just what we need in this day and age.

airspresso 4 hours ago | parent | prev | next [-]

Sounds like you're fighting the weights. What would it take to align the setup with what the LLM expects?

notpublic 16 hours ago | parent | prev | next [-]

Instead of including all these instructions in CLAUDE.md, have you considered using custom Skills? I’ve implemented something similar, and Skills works really well. The only downside is that it may consume more tokens.

stpedgwdgfhgdd 14 hours ago | parent | next [-]

The matching logic for a skill is pretty strict. I wonder whether mentioning ‘git’ in the front matter and using ‘gitlab’ would give a match for a skill to get triggered.

taytus 14 hours ago | parent | prev [-]

Yes, sometimes skills are more reliable, but not always. That is the biggest culprit to me so far. The fact that you cannot reliably trust these LLMs to follow steps or instructions makes them unsuitable for my applications.

notpublic 13 hours ago | parent [-]

Another thing that helps is adding a session hook that triggers on startup|resume|clear|compact to remind Claude about your custom skills. Keeps things consistent, especially when you're using it for a long time without clearing context

nautilus12 14 hours ago | parent | prev [-]

I had the same problem. My Claude md eventually gets forgotten and it forgets best practices that I put in there. I've switched to using hooks that run it through a variety of things like requiring testing. That seems to work better than Claude md because it has to run the hook every time it makes changes.

ewoodrich 5 hours ago | parent [-]

I really need something like this up for tasks I want Claude to run before handing off a task to me as "complete". It routinely ignores my instructions of checklist items that need to be satisfied to be considered successful. I have a helper script documented in CLAUDE.md that lets Claude or me get specific build/log outputs with a few one liner commands yet Claude can't be bothered to remember running them half the time.

Way too frequently Claude goes, "The task is fully implemented, error free with tests passing and no bugs or issues!" and I have to reply "did you verify server build/log outputs with run-dev per CLAUDE.md". It immediately knows the command I am referencing from the instructions buried in its context already, notices an issue and then goes back and fixes it correctly the second time. Whenever it happens it instantly makes an agentic coding session go from feeling like breezy, effortless fun to pulling teeth.

I've started to design a subagent to handle chores after every task to avoid context pollution but it sounds like hooks are the missing piece I need to deterministically guarantee it will run every time instead of just when Claude feels the vibes are right.