| ▲ | jonathanleane 4 hours ago | ||||||||||||||||||||||
Top solve rate is currently 24% with Opus 4.8... What's a competent human supposed to score? | |||||||||||||||||||||||
| ▲ | jascha_eng an hour ago | parent | next [-] | ||||||||||||||||||||||
I mean these were all solved before I assume so 100% not the same human ofc but models are expected to be good at a variety of code bases while human can specialize in one and learn. I think it's fair to compare to an individual that is used to working on a product. I'm more interested in how fable would do | |||||||||||||||||||||||
| ▲ | lacunary 4 hours ago | parent | prev [-] | ||||||||||||||||||||||
presumably whatever the top model uses and then some, since the human can use the model. I wonder if a model could score higher if it had a human at its disposal? | |||||||||||||||||||||||
| |||||||||||||||||||||||