| ▲ | AstroBen 3 hours ago | |
Is "good at benchmarks instead of real world tasks" really something to optimize for? What does this achieve? Surely people would be initially impressed, try it out, be underwhelmed and then move on. That's not great for Google | ||
| ▲ | nomel an hour ago | parent | next [-] | |
If they're memory/reference constrained systems that can't directly "store" every solution, then doing well on benchmarks should result in better real world/reasoning performance, since lack of memorized answer requires understanding. Like with humans [1], generalized reasoning ability lets you skip the direct storage of that solution, and many many others, completely! You can just synthesize a solution when a problem is presented. | ||
| ▲ | stephc_int13 3 hours ago | parent | prev | next [-] | |
Benchmarks are intended as proxy for real usage, and they are often useful to incrementally improve a system, especially when the end-goal is not well-defined. The trick is to not put more value in the score than what it is. | ||
| ▲ | spprashant 3 hours ago | parent | prev [-] | |
Initial impressions are currently worth a lot. In the long run I think the moat will dissolve, but currently its a race to lock-in users to your model and make switching costs high. | ||