Remix.run Logo
Frannky 5 hours ago

I think unless you're doing simple tasks, skills are unreliable. For better reliability, I have the agent trigger APIs that handles the complex logic (and its own LLM calls) internally. Has anyone found a solid strategy for making complex 'skills' more dependable?

selridge 4 hours ago | parent | next [-]

In my experience, all text “instruction” to the agent should be taken on a prayer. If you write compact agent guidance that is not contradictory and is local and useful to your project, the agent will follow it most of the time. There is nothing that you can write that will force the agent to follow it all of the time.

If one can accept failure to follow instructions, then the world is open. That condition does not really comport with how we think about machines. Nevertheless, it is the case.

Right now, a productive split is to place things that you need to happen into tooling and harnessing, and place things that would be nice for the agent to conceptualize into skills.

Frannky 2 hours ago | parent [-]

Yeah, that's my experience too

plufz 5 hours ago | parent | prev | next [-]

My only strategy is what used to be called slash-commands but are also skills now, I.e I call them explicitly. I think that actually works quite well and you can allow specific tools and tell it to use specific hooks for security of validation in the frontmatter properties.

chickensong 5 hours ago | parent | prev [-]

Is it that the skills aren't being triggered reliably, or that they get triggered but the skill itself is complex and doesn't work as expected?

Frannky 4 hours ago | parent [-]

both

chickensong 4 hours ago | parent [-]

I haven't done a lot with skills yet, but maybe try and leverage hooks to enforce skill usage, and move most of the skill's logic and complexity into a script so the agent only needs to reason about how to call the script.

Frannky 21 minutes ago | parent [-]

I think I'll wait until they are more reliable. For now, I use skills, but they just specify which endpoint to call. It should be also safer, different vps, no access to credentials but the bearer token.