|
| ▲ | square_usual 3 days ago | parent | next [-] |
| It has held this position since at least June. The Aider LLM leaderboards [1] have the Sonnet 3.5 June version beating 4o handily. Only o1-preview beat it narrowly, but IIRC at much higher costs. Sonnet 3.5 October has taken the lead again by a wide margin. 1: https://aider.chat/docs/leaderboards/ |
|
| ▲ | iLemming 3 days ago | parent | prev | next [-] |
| Anecdotally, Claude seems to hallucinate more during certain hours. It's amusing to watch, almost like your dog that gets too bored and stops responding to your commands - you say "sit" and he looks at you, tilts his head, looks straight up at you, almost like saying "I know what you're saying..." but then decides to run to another room and bring his toy. And you'd be wondering: "darn, where's that toughest, most obidient and smart Belgian malinois that just a few hour ago was ready to take down a Bin Laden?" |
| |
| ▲ | petesergeant 3 days ago | parent [-] | | Talking of anecdotal, 4o with canvas, which is normally excellent, tends to give up around a certain context length, and you have to copy and paste what you have into a new window to get it to make edits |
|
|
| ▲ | GaggiX 3 days ago | parent | prev | next [-] |
| It has been for the last several months now. |
|
| ▲ | maeil 3 days ago | parent | prev [-] |
| This week, along with the 20 weeks before that :) Model improvement has slowed down so much that things aren't changing quickly anymore. And Anthropic has only widened the gap with 3.5-v2. |