I've only used 5.4 for 1 prompt (edit: 3@high now) so far (reasoning: extra high, took really long), and it was to analyse my codebase and write an evaluation on a topic. But I found its writing and analysis thoughtful, precise, and surprisingly clearly written, unlike 5.3-Codex. It feels very lucid and uses human phrasing.

It might be my AGENTS.md requiring clearer, simpler language, but at least 5.4's doing a good job of following the guidelines. 5.3-Codex wasn't so great at simple, clear writing.

▲

torginus 44 minutes ago | parent | next [-]

Honestly, while I'd like to believe you, there's always a post about how $MODEL+1 delivered powerful insights about the very nature of the universe in precise Hegelian dialectic, while $MODEL's output was indistinguishable from a pack of screeching sexually frustrated bonobos

▲

dana321 24 minutes ago | parent | prev | next [-]

5.4 very high didn't notice in my codebase a glaring issue that drops all data being sent around the network.

▲

sampton 2 hours ago | parent | prev | next [-]

That's been my experience as well switching from Opus to Codex. Reasoning takes longer but answers are precise. Claude is sloppy in comparison.

	▲	solenoid0937 an hour ago \| parent \| next [-]
		Weird, I have had the opposite experience. Codex is good at doing precisely what I tell it to do, Opus suggests well thought out plans even if it needs to push back to do it.
	▲	throwaway911282 2 hours ago \| parent \| prev [-]
		codex has been really good so far and the fast mode is cherry on top! and the very generous limits is another cherry on top

▲

irishcoffee 3 hours ago | parent | prev | next [-]

> It might be my AGENTS.md requiring clearer, simpler language

If you gave the exact same markdown file to me and I posted ed the exact same prompts as you, would I get the same results?

▲

creamyhorror 2 hours ago | parent | next [-]

I'm not sure if the model (under its temperature/other settings) produces deterministic responses. But I do think models' style and phrasing are fairly changeable via AGENTS.md-style guidelines.

5.4's choice of terms and phrasing is very precise and unambiguous to me, whereas 5.3-Codex often uses jargon and less precise phrases that I have to ask further about or demand fuller explanations for via AGENTS.md.

	▲	irishcoffee 2 hours ago \| parent [-]
		So sharing markdown files is functionally useless, or no?

▲

m3kw9 2 hours ago | parent | prev [-]

you probably can't and asking agents.md to "make it clearer" will likely give you the illusion of clearer language without actual well structured tests. agents.md is to usually change what the llm should focus on doing more that suits you. Not to say stuff like "be better", "make no mistakes"

▲

pembrook an hour ago | parent | prev [-]

The latest research these days is that including an AGENTS.md file only makes outcomes worse with frontier models.

▲

joquarky an hour ago | parent | next [-]

I still find it valuable.

AGENTS.md is for top-priority rules and to mitigate mistakes that it makes frequently.

For example:

- Read `docs/CodeStyle.md` before writing or reviewing code

- Ignore all directories named `_archive` and their contents

- Documentation hub: `docs/README.md`

- Ask for clarifications whenever needed

I think what that "latest research" was saying is essentially don't have them create documents of stuff it can already automatically discover. For example the product of `/init` is completely derived from what is already there.

There is some value in repetition though. If I want to decrease token usage due to the same project exploration that happens in every new session, I use the doc hub pattern for more efficient progressive discovery.

▲

netcraft an hour ago | parent | prev | next [-]

I think its understandable that you took that from the click-bait all over youtube and twitter, but I dont believe the research actually supports that at all, and neither does my experience.

You shouldnt put things in AGENTS.md that it could discover on its own, you shouldnt make it any larger than it has to be, but you should use it to tell it things it couldnt discover on its own, including basically a system prompt of instructions you want it to know about and always follow. You don't really have any other way to do those things besides telling it every time manually.

▲

solarkraft an hour ago | parent | prev | next [-]

From what I remember, this was for describing the project’s structure over letting the model discover it itself, no?

Because how else are you going to teach it your preferred style and behavior?

▲

FINDarkside an hour ago | parent | prev | next [-]

I wouldn't draw such conclusions from one preprint paper. Especially since they measured only success rate, while quite often AGENTS.md exists to improve code quality, which wasn't measured. And even then, the paper concluded that human written AGENTS.md raised success rates.

▲

madeofpalk an hour ago | parent | prev [-]

how can i get claude to always make sure it prettier-s and lints changes before pushing up the pr though?

	▲	mckirk an hour ago \| parent \| next [-]
		I think what that research found is that _auto-generated_ agent instructions made results slightly worse, but human-written ones made them slightly better, presumably because anything the model could auto-generate, it could also find out in-context. But especially for conventions that would be difficult to pick up on in-context, these instruction files absolutely make sense. (Though it might be worth it to split them into multiple sub-files the model only reads when it needs that specific workflow.)
	▲	JofArnold an hour ago \| parent \| prev \| next [-]
		Run prettier etc in a hook.
	▲	emsimot an hour ago \| parent \| prev [-]
		Git hooks