▲ | pu_pe 2 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> For instance, if a cutting-edge AI tool can expend $1000 worth of compute resources to solve an Olympiad-level problem, but its success rate is only 20%, then the actual cost required to solve the problem (assuming for simplicity that success is independent across trials) becomes $5000 on the average (with significant variability). If only the 20% of trials that were successful were reported, this would give a highly misleading impression of the actual cost required (which could be even higher than this, if the expense of verifying task completion is also non-trivial, or if the failures to solve the goal were correlated across iterations). This is a very valid point. Google and ChatGPT announced they got the gold medal with specialized models, but what exactly does that entail? If one of them used a billion dollars in compute and the other a fraction of that, we should know about it. Error rates are equally important. Since there are conflicts of interest here, academia would be best suited for producing reliable benchmarks, but they would need access to closed models. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | sojuz151 a day ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Compute has been getting cheaper and models more optimised. So if models can do something it will not be long till they can do this cheap. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | moffkalast 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> with specialized models > what exactly does that entail Overfitting on the test set with models that are useless for anything else, that's what. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | JohnKemeny 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Don't put Google and ChatGPT in the same category here. Google cooperated with the organizers, at least. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|