| ▲ | danielcampos93 11 hours ago | |
I would love to know what the increased token count is across these models for the benchmarks. I find the models continue to get better but as they do their token usage also does. Aka is model doing better or reasoning for longer? | ||
| ▲ | jstummbillig 10 hours ago | parent [-] | |
I think that is always something that is being worked on in parallel. Recent paradigm seems to be the models understanding when they need to use more tokens dynamically (which seems to be very much in line with how computation should generally work). | ||