| ▲ | avereveard 4 hours ago | |||||||||||||
"astounding how much the harness matters" is the right read and it should be the lasting one. the model is rentable, the prompts are rentable, the benchmark numbers are mostly a function of the harness around them. swapping Gemini for Sonnet underneath the same harness has a smaller bench delta than swapping the harness around the model. the cheating-agents post you linked is the same observation through a different lens, the harness is what's being measured, the model is just the substrate. that said context management seem to be solving today model problems, more than being an universal property, and will probably be obsoleted a few model generations down the road, as tool obsoleted RAG context injection from question embeddings. | ||||||||||||||
| ▲ | himata4113 4 hours ago | parent | next [-] | |||||||||||||
That's why ARC-AGI-3 doesn't allow the use of a harnesses. The model has to create the harness instead. | ||||||||||||||
| ||||||||||||||
| ▲ | 4 hours ago | parent | prev [-] | |||||||||||||
| [deleted] | ||||||||||||||