Does anyone find that agents just don't use them without being asked?

libraryofbabel 11 hours ago | parent | next [-]

This has been a problem for us too. Sometimes they reach for skills, sometimes they don’t and just try to do the thing on their own. It’s annoying.

I think this is (mostly) a solvable problem. The current generation of SotA models wasn’t RLVR-trained on skills (they didn’t exist at that time) and probably gets slightly confused by the way the little descriptions are all packed into the same tool call schema. (At least that’s how it works with Claude Code.) The next generation will have likely been RLVRed on a lot of tasks where skills are available, and will use them much more reliably. Basically, wait until the next Opus release and you should hopefully see major improvements. (Of course, all this stuff is non-deterministic blah blah, but I think it’s reasonable to expect going from “misses the skill 30% of the time” to “misses it 2% of the time”.)

	▲	oulu2006 2 hours ago \| parent \| next [-]
		Same, I have a bunch of skills defined ith proper YAML headers and semantic triggers installed, I make a point of listing not too many but making it quite specific. Even with that, I have to be very specific in triggering a skill and it's hit or miss if it picks up on the skill -- usually I have to say there is a skill with this go and use it.
	▲	empath75 10 hours ago \| parent \| prev [-]
		I think this is mostly a problem of making things skills that don't need to be skills (telling it how to do something it already knows how to do), and having way too much context, so that the skills effectively disappear. If skills are important, information about using skills needs to be a relatively large proportion of the context. Probably the right way to do it, is aggressively trimming anything that might distract from them.

▲

modernerd 11 hours ago | parent | prev | next [-]

That's also what Vercel found:

> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it. Adding the skill produced no improvement over baseline.

> …

> Skills aren't useless. The AGENTS.md approach provides broad, horizontal improvements to how agents work with Next.js across all tasks. Skills work better for vertical, action-specific workflows that users explicitly trigger,

https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

▲

jillesvangurp 11 hours ago | parent | prev | next [-]

Depends what you use perhaps. I use codex and it seems to mostly stick to instructions I give. I use an AGENTS.md that explicitly points to the repository's skill directory. I mostly keep instructions in there for obvious things like how to build, how to test, what to do before declaring a thing done, etc. I don't tend to have a lot of skills in there either.

Probably the more skills you have, the more confused it might get. The more potentially conflicting instructions you give the harder it gets for an LLM to figure out what you actually want to happen.

If I catch it going off script, I often interrupt it and tell it what to do and update the relevant skill. Seems to work pretty good. Keeping things simple seems to work.

▲

rco8786 11 hours ago | parent | prev | next [-]

Yep. I have an incredibly hard time getting them to use Skills at all, even when asked.

I saw someone's analysis a few days ago and they found that their agents were more accurate when just dumping the skill context directly into AGENTS.md

▲

11 hours ago | parent | prev | next [-]

[deleted]

▲

troupo 11 hours ago | parent | prev | next [-]

Because "skills" are just .md files that the lossy compressing statistical output machine may or may not find and that may or may not be retained in the tiny context window

	▲	chasd00 10 hours ago \| parent [-]
		I don’t think you should be downvoted. Skills and history get added to the prompt, there’s no other interface to the model to do anything different. I think it’s smart to keep this in mind when working with LLMs. It’s like keeping in mind that a webserver just responds to HTTP requests when developing a web application. You need to keep perspective. Edit: btw I’ve gone from genai value denier to skeptic to cautiously optimistic to fairly impressed in the span of a year. (I’m a user of Claude code)

▲

shmoogy 11 hours ago | parent | prev | next [-]

I often find they aren't triggered when I would expect using a keyword and explicitly trigger them.

▲

tobyhinloopen 11 hours ago | parent | prev [-]

Same! If I put the skill's instructions in the general AGENTS.md, it works just fine.