| ▲ | thot_experiment an hour ago | |||||||
False. The absolute capability is irrelevant, with the proper harness 31b is more than adequate for a very large portion of the tasks I ask AI to do. The metric isn't how good the model is at Erdos Problems, it's how reliably it can remove drudgery in my life. It just autonomously reverse engineered a bluetooth protocol with minimal intervention, it's ability to react to data and ground itself is constantly impressive to me. I do a ton of testing with these models, today I had Gemma answer a physics problem that Opus 4.7 gave up on. With a decent harness and context the set of tasks where their capabilities are both good enough is very surprising. The tasks I have that stump Gemma often also stump Opus 4.7. | ||||||||
| ▲ | amelius an hour ago | parent [-] | |||||||
This is like saying that 640kB is enough for anybody. | ||||||||
| ||||||||