Remix.run Logo
imiric 2 hours ago

Great, so how do you know this stuff works? Did you evaluate it against other approaches? How do you know it's actually reliable?

The Vercel team had some interesting findings[1]:

> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it.

Others had different findings for commonly accepted practices[2], some you may have adopted from reading documentation, which surely didn't come from influencers.

And yet others swear by magical Markdown documents[3].

So... who is the ultimate authority on what actually works, and who is just cargo culting the trendy practice of the week? And how is any of this different from what was being done a few years ago?

[1]: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

[2]: https://arxiv.org/abs/2602.11988

[3]: https://soul.md/

JohnMakin 12 minutes ago | parent [-]

Sorry, but from your first comment, I don’t particularly feel inclined to help you figure this out. I was just offering I’ve already deployed these things at a scale with success using many of the configuration options offered as documentation in the op here. this stuff isn’t some mystical blackbox, although you seem to think it is.

I measure the tooling success with a suite of small prompt tests performing repeatable tasks, measuring the success rate over time, educating the broader team, and providing my own tried and tested in the field skills that I’ve shared to similar successes to the broader teams. We’ve seen a huge increase in velocity and lower bug rate, which are also very easily measurable (and long evaluated stats) enough to put me in the position I am, which was not a reluctant one. You’re perfectly free to view my long history on this topic on this forum to see I am a complete skeptic on this topic, and wouldn’t be here unless I had to.

everyone is figuring this out still. There is no authority, I am my own authority on what I have seen work and what hasn’t. Feel free to take of that what you will. I just wanted to provide a counterpoint to your initial claim. I’m certainly not going to expose to a fine degree what has worked for my org and what hasn’t due to obvious reasons.

have a good day!