Remix.run Logo
losvedir a day ago

I never really understood why you have to stuff all the tools in the context. Is there something wrong with having all your tools in, say, a markdown file, and having a subagent read it with a description of the problem at hand and returning just the tool needed at that moment? Is that what this tool search is?

jimbo808 21 hours ago | parent | next [-]

Claude is pretty good at totally disregarding most of what’s in your CLAUDE.md, so I’m not optimistic. For example a project I work on gives it specific scripts to run when it runs automated tests, because the project is set up in a way that requires some special things to happen before tests will work correctly. I’ve never once seen it actually call those scripts on the first try. It always tries to run them using the typical command that doesn’t work with our setup, and I have to remind it the what correct thing to run is.

losvedir 10 hours ago | parent | next [-]

That's kind of the opposite of what I mean. CLAUDE.md is (ostensibly) always loaded into the context window so it affects everything the model does.

I'm suggesting a POTENTIAL_TOOLS.md file that is not loaded into the context, but which Claude knows the existence of. That file would be an exhaustive list of all the tools you use, but which would be too many tokens to have perpetually in the context.

Finally, Claude would know - while it's planning - to invoke a sub-agent to read that file with a high level idea of what it wants to do, and let the sub-agent identify the subset of relevant tools and return those to the main agent. Since it was the sub-agent that evaluated the huge file, the main agent would only have the handful of relevant tools in its context.

snek_case 19 hours ago | parent | prev | next [-]

I've had a similar experience with Gemini ignoring things I've explicitly told it (sometimes more than once). It's probably context rot. LLM give you a huge advertised number of tokens in the context, but the more stuff you put in there, the less reliably it remembers everything, which makes sense given how transformer attention blocks work internally.

cerved 18 hours ago | parent | prev | next [-]

Claude is pretty good at forgetting to run maven with -am flag, writing bash with heredocs that it's interpreter doesn't weird out on, using the != operator in jq. Maybe Claude has early onset dementia.

vendiddy 15 hours ago | parent [-]

Demented AIs running amock is just what we need in this day and age.

airspresso 3 hours ago | parent | prev | next [-]

Sounds like you're fighting the weights. What would it take to align the setup with what the LLM expects?

notpublic 15 hours ago | parent | prev | next [-]

Instead of including all these instructions in CLAUDE.md, have you considered using custom Skills? I’ve implemented something similar, and Skills works really well. The only downside is that it may consume more tokens.

stpedgwdgfhgdd 13 hours ago | parent | next [-]

The matching logic for a skill is pretty strict. I wonder whether mentioning ‘git’ in the front matter and using ‘gitlab’ would give a match for a skill to get triggered.

taytus 13 hours ago | parent | prev [-]

Yes, sometimes skills are more reliable, but not always. That is the biggest culprit to me so far. The fact that you cannot reliably trust these LLMs to follow steps or instructions makes them unsuitable for my applications.

notpublic 13 hours ago | parent [-]

Another thing that helps is adding a session hook that triggers on startup|resume|clear|compact to remind Claude about your custom skills. Keeps things consistent, especially when you're using it for a long time without clearing context

nautilus12 13 hours ago | parent | prev [-]

I had the same problem. My Claude md eventually gets forgotten and it forgets best practices that I put in there. I've switched to using hooks that run it through a variety of things like requiring testing. That seems to work better than Claude md because it has to run the hook every time it makes changes.

ewoodrich 4 hours ago | parent [-]

I really need something like this up for tasks I want Claude to run before handing off a task to me as "complete". It routinely ignores my instructions of checklist items that need to be satisfied to be considered successful. I have a helper script documented in CLAUDE.md that lets Claude or me get specific build/log outputs with a few one liner commands yet Claude can't be bothered to remember running them half the time.

Way too frequently Claude goes, "The task is fully implemented, error free with tests passing and no bugs or issues!" and I have to reply "did you verify server build/log outputs with run-dev per CLAUDE.md". It immediately knows the command I am referencing from the instructions buried in its context already, notices an issue and then goes back and fixes it correctly the second time. Whenever it happens it instantly makes an agentic coding session go from feeling like breezy, effortless fun to pulling teeth.

I've started to design a subagent to handle chores after every task to avoid context pollution but it sounds like hooks are the missing piece I need to deterministically guarantee it will run every time instead of just when Claude feels the vibes are right.

falcor84 a day ago | parent | prev | next [-]

That's exactly what Claude Skills do [0], and while this tool search appears to be distinct, I do think that they're on the way to integrating MCP and Skills.

[0] https://code.claude.com/docs/en/skills

esperent 20 hours ago | parent [-]

I haven't had much luck with skills being called appropriately. When I have a skill called "X doer", and then I write a prompt like "Open <file> and do X", it almost never loads up the skill. I have to rewrite the prompt as "Open <file> and do X using the X doer skill".

Which is basically exactly as much effort as what I was doing previously of having prewritten sub-prompts/agents in files and loading up the file each time I want to use it.

I don't think this is an issue with how I'm writing skills, because it includes skill like the Skill Creator from Anthropic.

notpublic 13 hours ago | parent | next [-]

Try adding a session hook that triggers on startup|resume|clear|compact to remind Claude about your custom skills.

esperent 11 hours ago | parent [-]

Is there a session start hook? I don't think so, unless it was added recently.

I've mostly been working on smaller projects so I never need to compact. And skills are definitely not working even on the initial prompt of a new session.

slhck 17 hours ago | parent | prev [-]

Same experience here – it seems I have to specifically tell it to use the "X skill" to trigger it reliably. I guess with all the different rules set up for Claude to follow, it needs that particular word to draw its attention to the required skill.

_joel 14 hours ago | parent [-]

Ditto, I also find it'll invariably decide to disregard the CLAUDE.md again and produce a load of crap I didn't really ask it for.

JyB a day ago | parent | prev | next [-]

That’s exactly what it is in essence. The MCP protocol simply doesn’t have any mechanism specifications (yet) for not loading tools completely in the context. There’s nothing really strange about it. It’s just a protocol update issue.

14 hours ago | parent [-]
[deleted]
noodletheworld 19 hours ago | parent | prev [-]

> I never really understood why you have to stuff all the tools in the context.

You probably don't for... like, trivial cases?

...but, tool use is the most fine grained point, usually, in an agent's step-by-step implementation plan; So when planning, if you don't know what tool definitions exist, an agent might end up solving a problem naively step-by-step using primitive operations, when a single tool already exists that does that, or does part of it.

Like, it's not quite as simple as "Hey, do X"

It's more like: "Hey, make a plan to do X. When you're planning, first fetch a big list of the tools that seem vaguely related to the task and make a step-by-step plan keeping in mind the tools available to you"

...and then, for each step in the plan, you can do a tool search to find the best tool for x, then invoke it.

Without a top level context of the tools, or tool categories, I think you'll end up in some dead-ends with agents trying to use very low level tools to do high level tasks and just spinning.

The higher level your tool definitions are, the worse the problem is.

I've found this is the case even now with MCP, where sometimes you have to explicitly tell an agent to use particular tools, not to try to re-invent stuff or use bash commands.