| ▲ | mike_hearn 15 hours ago | |||||||||||||
That's an interesting claim, but I don't see it in my own work. They have got better but it's very hard to quantify. I just find myself editing their work much less these days (currently using GPT 5.4). | ||||||||||||||
| ▲ | dwedge 15 hours ago | parent | next [-] | |||||||||||||
Without meaning to sound dismissive, because I'm really not intending to, there's also the possibility that you've gotten worse after enough time using them. You're treating yourself as a constant in this, but man cannot walk in the same river twice. | ||||||||||||||
| ||||||||||||||
| ▲ | nkozyra 15 hours ago | parent | prev | next [-] | |||||||||||||
The problem with evals is the underlying rubric will always be either subjective, or a quantitative score based on something that is likely now baked into the training set directly. You kind of have to go on "feels" for a lot of this. | ||||||||||||||
| ▲ | mountainriver 4 hours ago | parent | prev [-] | |||||||||||||
Yeah same, and all my coworkers feel the same. Most of us have been coding for ages. I actually find it really odd people keep trying to disprove things that are relatively obvious with LLMs | ||||||||||||||