| ▲ | sarreph 2 hours ago | ||||||||||||||||
I had intended to caveat that: I'm sure I'm not the first person to ask about this! > you still see improvements This is expected if they are training their models on it, right? > objectively-bad results Keen to learn when this has been the case, i.e. across version increments in major models. | |||||||||||||||||
| ▲ | simonw 2 hours ago | parent [-] | ||||||||||||||||
I've written about this a couple of times, most notably here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-... I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much. (Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 ) | |||||||||||||||||
| |||||||||||||||||