| ▲ | Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs(github.com) | |||||||||||||
| 26 points by darkrishabh 5 hours ago | 5 comments | ||||||||||||||
| ▲ | ssgodderidge an hour ago | parent | next [-] | |||||||||||||
The example model in the documentation is 4o-mini, you might want to update that to a more recent model. As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place? | ||||||||||||||
| ||||||||||||||
| ▲ | egeozcan 2 hours ago | parent | prev | next [-] | |||||||||||||
Are there any published results gathered using this? | ||||||||||||||
| ▲ | ianhxu an hour ago | parent | prev [-] | |||||||||||||
How do you iterate on the judge prompt? Is there an auto rater? | ||||||||||||||