|
| ▲ | saubeidl 7 hours ago | parent | next [-] |
| Those are SOTA for open models. It's a separate league from closed models entirely. |
| |
| ▲ | supermatt 7 hours ago | parent [-] | | > It's a separate league from closed models entirely. To be fair, the SOTA models aren't even a single LLM these days. They are doing all manner of tool use and specialised submodel calls behind the scenes - a far cry from in-model MoE. |
|
|
| ▲ | tarruda 8 hours ago | parent | prev [-] |
| > Do you disagree with that? I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this. |
| |
| ▲ | meatmanek 4 hours ago | parent [-] | | Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money. |
|