I'm a quite senior frontend using React and even I see Sonnet 4.5 struggle with basic things. Today it wrote my Zod validation incorrectly, mixing up versions, then just decided it wasn't working and attempted to replace the entire thing with a different library.

▲

baq 2 days ago | parent | next [-]

There’s little reason to use sonnet anymore. Haiku for summaries, opus for anything else. Sonnet isn’t a good model by today’s standards.

▲

subomi 2 days ago | parent | prev [-]

Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.

My rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too.

▲

puttycat 2 days ago | parent | next [-]

> Engineers write bugs all the time

Why do we hold calculators to such high bars? Humans make calculation mistakes all the time.

Why do we hold banking software to such high bars? People forget where they put their change all the time.

Etc etc.

▲

Der_Einzige 2 days ago | parent [-]

I don't hold calculators to high bars. They think 0.1 + 0.2 = 0.30000000000000004:

https://qntm.org/notpointthree

	▲	recursive a day ago \| parent [-]
		Some of them. The good ones don't.

▲

lelandfe 2 days ago | parent | prev [-]

my unrealistic bar lies somewhere above "pick a new library" bug resolution