> For instance, if a cutting-edge AI tool can expend $1000 worth of compute resources to solve an Olympiad-level problem, but its success rate is only 20%, then the actual cost required to solve the problem (assuming for simplicity that success is independent across trials) becomes $5000 on the average (with significant variability). If only the 20% of trials that were successful were reported, this would give a highly misleading impression of the actual cost required (which could be even higher than this, if the expense of verifying task completion is also non-trivial, or if the failures to solve the goal were correlated across iterations).

This is a very valid point. Google and ChatGPT announced they got the gold medal with specialized models, but what exactly does that entail? If one of them used a billion dollars in compute and the other a fraction of that, we should know about it. Error rates are equally important. Since there are conflicts of interest here, academia would be best suited for producing reliable benchmarks, but they would need access to closed models.

▲

sojuz151 a day ago | parent | next [-]

Compute has been getting cheaper and models more optimised. So if models can do something it will not be long till they can do this cheap.

	▲	EvgeniyZh a day ago \| parent [-]
		GPU compute per watt has grown by a factor of 2 in last 5 years

▲

moffkalast 2 days ago | parent | prev | next [-]

> with specialized models

> what exactly does that entail

Overfitting on the test set with models that are useless for anything else, that's what.

▲

JohnKemeny 2 days ago | parent | prev [-]

Don't put Google and ChatGPT in the same category here. Google cooperated with the organizers, at least.

▲

ml-anon a day ago | parent | next [-]

Also neither got a gold medal. Both solved problems to meet the threshold for a human child getting a gold medal but it’s like saying an F1 car got a gold medal in the 100m sprint at the Olympics.

▲

bwfan123 a day ago | parent | next [-]

The popular science title was funnier with a pun on "mathed" [1]

"Human teens beat AI at an international math competition Google and OpenAI earned gold medals, but were still out-mathed by students."

[1] https://www.popsci.com/technology/ai-math-competition/

	▲	a day ago \| parent [-]
		[deleted]

▲

nmca a day ago | parent | prev | next [-]

Indeed, it’s like saying a jet plane can fly!

▲

vdfs a day ago | parent | prev [-]

"Google F1 Preview Experimental beat the record of the fastest man on earth Usain Bolt"

▲

spuz 2 days ago | parent | prev [-]

Could you clarify what you mean by this?

	▲	raincole 2 days ago \| parent \| next [-]
		Google's answers were judged by IMO. OpenAI's were judged by themselves internally. Whether it matters is up to the reader.
	▲	EnnEmmEss 2 days ago \| parent \| prev [-]
		TheZvi had a summarization of this here: https://thezvi.substack.com/i/168895545/not-announcing-so-fa... In short (there is nuance), Google cooperated with the IMO team while OpenAI didn't which is why OpenAI announced before Google.