| ▲ | fchollet 5 hours ago | |
It is 100% ARC-AGI-3 specific though, just read through the prompts https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol... | ||
| ▲ | boxed 3 hours ago | parent | next [-] | |
What a dick move. Making that prompt open source will probably mean that every other model that doesn't want to cheat will scrape that and accidentally cheat in the next models. | ||
| ▲ | diwank 3 hours ago | parent | prev | next [-] | |
this is so disingenuous on symbolica's part. these insincere announcements just make it harder for genuine attempts and novel ideas | ||
| ▲ | DetroitThrow 4 hours ago | parent | prev [-] | |
Um, yes this is a extremely specific as a benchmark harness. It has a ton of knowledge encoded about the tasks at hand. The tweet is dishonest even in the best light. The hard part of these tests isn't purely reasoning ability ffs. | ||