| ▲ | mincer_ray 6 hours ago |
| seems like a really minor upgrade? |
|
| ▲ | Nicholas_C 6 hours ago | parent | next [-] |
| I think they will all be minor going forward, feels like the major improvements have all been made and we'll only see incremental improvements from here on out. Maybe I'm wrong but we'll see. |
| |
| ▲ | spelk 5 hours ago | parent | next [-] | | Hard to say. People made the same prediction a year ago because we supposedly ran out of training data. There could be indefinite rapid compounding improvements so long as there's free money out there. | | |
| ▲ | jmalicki 5 hours ago | parent [-] | | With RLHF and RLVR we are creating tons of new training data, that is much more focused than reading the Internet. Annotation shops are doing many billions per year in revenue creating newer data, and a lot of it is highly complex, focused on rewarding multi turn agentic trajectories. |
| |
| ▲ | conradkay an hour ago | parent | prev | next [-] | | I think there's just less time between model releases now | |
| ▲ | Eufrat 5 hours ago | parent | prev | next [-] | | I think one of the challenges is that the models were all initially trained on the entire Internet (or as much as they could gather) and now they’re having to deal with an increasing amount of the Internet being AI generated content which may be why GPT-5.5 started being obsessed with goblins and you start seeing amusing things in the system prompt trying to get the model to stop bringing them up. | |
| ▲ | chandureddyvari 5 hours ago | parent | prev [-] | | Wasn't Mythos a step change improvement? |
|
|
| ▲ | pmxi 5 hours ago | parent | prev | next [-] |
| Yeah. They are aware:
"Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor." |
|
| ▲ | scotty79 an hour ago | parent | prev | next [-] |
| I think we lack benchmarks that could meaningfully indicate progress. They are mostly garbage that's saturated at this point. God wouldn't score much higher in them. |
|
| ▲ | teeray 5 hours ago | parent | prev [-] |
| Yes, but if version number go up, so do all other number |