| ▲ | forrestthewoods 3 hours ago | |||||||
At the end of the day “feel” is what people rely on to pick which tool they use. I’d feel unscientific and broken? Sure maybe why not. But at the end of the day I’m going to choose what I see with my own two eyes over a number in a table. Benchmarks are a sometimes useful to. But we are in prime Goodharts Law Territory. | ||||||||
| ▲ | AstroBen 3 hours ago | parent [-] | |||||||
yeah, to be honest it probably doesn't matter too much. I think the major models are very close in capabilities | ||||||||
| ||||||||