| ▲ | Sir_Twist 2 days ago | |
Not an expert in LLM benchmarks, but I generally I think of benchmarks as being good particularly for measuring usefulness for certain usecases. Even if measuring LLMs is not as straightforward as, say, read/write speeds when comparing different SSDs, if a certain model's responses are consistently measured as being higher quality / more useful, surely that means something, right? | ||