| ▲ | Aurornis 3 hours ago | |
The models outperform on the benchmarks relative to general tasks. The benchmarks are public. They're guaranteed to be in the training sets by now. So the benchmarks are no longer an indicator of general performance because the specific tasks have been seen before. > And could quantization maybe explain the worse than expected results? You can use the models through various providers on OpenRouter cheaply without quantization. | ||
| ▲ | 3 hours ago | parent [-] | |
| [deleted] | ||