Remix.run Logo
pamelafox 7 hours ago

This is why I only add information to AGENTS.md when the agent has failed at a task. Then, once I've added the information, I revert the desired changes, re-run the task, and see if the output has improved. That way, I can have more confidence that AGENTS.md has actually improved coding agent success, at least with the given model and agent harness.

I do not do this for all repos, but I do it for the repos where I know that other developers will attempt very similar tasks, and I want them to be successful.

viraptor 6 hours ago | parent | next [-]

You can also save time/tokens if you see that every request starts looking for the same information. You can front-load it.

sebazzz 6 hours ago | parent | next [-]

Also take the randomness out of it. Sometimes the agent executing tests one way, sometimes the other way.

Maxion an hour ago | parent [-]

I've found https://github.com/casey/just to be very very useful. Allows to bind common commands simple smaller commands that can be easily referenced. Good for humans too.

NicoJuicy 5 hours ago | parent | prev [-]

Don't forget to update it regularly then

averrous 7 hours ago | parent | prev | next [-]

Agree. I also found out that rule discovery approach like this perform better. It is like teaching a student, they probably have already performed well on some task, if we feed in another extra rule that they already well verse at, it can hinder their creativity.

imiric 6 hours ago | parent | prev [-]

That's a sensible approach, but it still won't give you 100% confidence. These tools produce different output even when given the same context and prompt. You can't really be certain that the output difference is due to isolating any single variable.

pamelafox 6 hours ago | parent | next [-]

So true! I've also setup automated evaluations using the GitHub Copilot SDK so that I can re-run the same prompt and measure results. I only use that when I want even more confidence, and typically when I want to more precisely compare models. I do find that the results have been fairly similar across runs for the same model/prompt/settings, even though we cannot set seed for most models/agents.

ChrisGreenHeur 5 hours ago | parent | prev [-]

same with people, no matter what info you give a person you cant be sure they will follow it the same every time