| ▲ | tux3 2 hours ago | |
If you're trying to compare what the models are good at, important to note that the different models did not run with the same settings. In one case they also retried with GPT until it answered all the problems but did not retry with the other models. GPT has 5 effort settings and they picked the highest (xhigh). Claude has 5 and they picked the middle one to avoid having to retry when it timed out. Gemini has medium or high effort and they picked medium. | ||
| ▲ | christianstump 2 hours ago | parent [-] | |
the difference between gpt and gemini concerning the "retry until..." can almost be ignored. I did rerun gpt a few times, but still way below what gemini was not able to answer at all. | ||