What I would do if I was in the position of a large company in this space is to arrange an internal team to create an ARC replica, covering very similar puzzles and use that as part of the training.

Ultimately, most benchmarks can be gamed and their real utility is thus short-lived.

But I think this is also fair to use any means to beat it.

▲

tylervigen 4 hours ago | parent | next [-]

I agree that for any given test, you could build a specific pipeline to optimize for that test. I supposed that's why it is helpful to have many tests.

However, many people have worked hard to optimize tools specifically for ARC over many years, and it's proven to be a particularly hard test to optimize for. This is why I find it so interesting that LLMs can do it well at all, regardless of whether tests like it are included in training.

▲

stephc_int13 an hour ago | parent [-]

The real strength of current neural nets/transformers relies on huge datasets.

ARC do not provide this kind of dataset, only a small public one and a private one where they do the benchmarks.

Building your own large private ARC set does not seem too difficult if you have enough resources.

	▲	an hour ago \| parent [-]
		[deleted]

▲

AstroBen an hour ago | parent | prev | next [-]

Is "good at benchmarks instead of real world tasks" really something to optimize for? What does this achieve? Surely people would be initially impressed, try it out, be underwhelmed and then move on. That's not great for Google

	▲	stephc_int13 an hour ago \| parent \| next [-]
		Benchmarks are intended as proxy for real usage, and they are often useful to incrementally improve a system, especially when the end-goal is not well-defined. The trick is to not put more value in the score than what it is.
	▲	spprashant an hour ago \| parent \| prev [-]
		Initial impressions are currently worth a lot. In the long run I think the moat will dissolve, but currently its a race to lock-in users to your model and make switching costs high.

▲

2 hours ago | parent | prev | next [-]

[deleted]

▲

simpsond 4 hours ago | parent | prev [-]

Humans study for tests. They just tend to forget.