| ▲ | dbgrman 2 hours ago | |
thats a pretty cynical take. > past the point of human ability to discern whether they are actually better or worse This is lack of imagination. If you use these models heavily enough, pretty soon you'll hit the edges of their capabilities. The smarter among us are collecting these problems into a personal benchmark and use that to judge model capability. I think this is the right approach, and dare I say, even better than generic benchmarks. To me, it matters less what the benchmark says, and more what my particular problems are. | ||