Remix.run Logo
Arifcodes 4 hours ago

The study measures the wrong thing. Task completion ("does the PR pass tests?") is a narrow proxy for what AGENTS.md actually helps with in production.

I run a system with multiple AI agents sharing a codebase daily. The AGENTS.md file doesn't exist to help the agent figure out how to fix a bug. It exists to encode tribal knowledge that would take a human weeks to accumulate: which directory owns what, how the deploy pipeline works, what patterns the team settled on after painful debates. Without it, the agent "succeeds" at the task but produces code that looks like it was written by someone who joined the team yesterday. It passes tests but violates every convention.

The finding that context files "encourage broader exploration" is actually the point. I want the agent to read the testing conventions before writing tests. I want it to check the migration patterns before creating a new table. That costs more tokens, yes. But reverting a merged PR that used the wrong ORM pattern costs more than 20% extra inference.

Gigachad 2 hours ago | parent [-]

What are you putting in the file? When I’ve looked at them they just looked like a second readme file without the promotional material in a typical GitHub readme.

Arifcodes 43 minutes ago | parent | next [-]

The useful stuff is different from a README. A README tells humans how to use the project. An AGENTS.md tells the AI how to work on it.

Mine typically includes:

- Build/test commands that aren't obvious from package.json (e.g. "run migrations before tests") - Architecture decisions that would take the agent 10 minutes to reverse-engineer ("auth goes through middleware X, not controller Y") - Known gotchas ("don't touch the legacy billing module, it's being replaced next sprint") - Deploy process specifics ("push to main auto-deploys staging, prod needs a manual tag") - Coding conventions that aren't in the linter ("we use Result types for errors, never throw")

The ones that look like READMEs are indeed useless. The good ones read more like the notes you'd give a new senior engineer on their first day. Stuff that's obvious to the team but invisible to an outsider.

0x696C6961 an hour ago | parent | prev [-]

That's basically all it is. It's a readme file that is guaranteed to be read. So the agent doesn't spend 10 minutes trying to re-configure the toolchain because the first command it guessed didn't work.