| ▲ | hnfong a day ago | |
I don't trust the benchmarks either, so I maintained a set of benchmarks myself. I'm mostly interested in local models, and for the past 2 years they have steadily gotten better. Can't argue with subjective experience, but if there were some tasks that you thought LLMs can't do two years ago, maybe try again today. You might be surprised. | ||