That's an interesting claim, but I don't see it in my own work. They have got better but it's very hard to quantify. I just find myself editing their work much less these days (currently using GPT 5.4).

▲

dwedge 15 hours ago | parent | next [-]

Without meaning to sound dismissive, because I'm really not intending to, there's also the possibility that you've gotten worse after enough time using them. You're treating yourself as a constant in this, but man cannot walk in the same river twice.

	▲	Mond_ 15 hours ago \| parent \| next [-]
		This is such a silly response when "You've gotten better at using them and know how to work around their flaws now." is right there and seems a lot more plausible.
	▲	mike_hearn 13 hours ago \| parent \| prev [-]
		That's a possibility, but I doubt it. I've been programming for 35 years and know what I like in code. I've also previously maintained a long review prompt in which I tell the models all the ways in which they get things wrong and to go look for/fix those problems. But those review passes now don't take as long because there are fewer such problems to begin with. In particular GPT 5.4 is much better at not duplicating code unnecessarily. It'll take the time to refactor, to search for pre-existing utility functions, etc.

▲

nkozyra 15 hours ago | parent | prev | next [-]

The problem with evals is the underlying rubric will always be either subjective, or a quantitative score based on something that is likely now baked into the training set directly.

You kind of have to go on "feels" for a lot of this.

▲

mountainriver 4 hours ago | parent | prev [-]

Yeah same, and all my coworkers feel the same.

Most of us have been coding for ages. I actually find it really odd people keep trying to disprove things that are relatively obvious with LLMs