At this point I'm surprised they haven't been training on thousands of professionally-created SVGs of pelicans on bicycles.

i think anything that makes it clear they've done that would be a lot worse PR than failing the pelican test would ever be.

	▲	imiric 4 hours ago \| parent [-]
		It would be next to impossible for anyone without insider knowledge to prove that to be the case. Secondly, benchmarks are public data, and these models are trained on such large amounts of it that it would be impractical to ensure that some benchmark data is not part of the training set. And even if it's not, it would be safe to assume that engineers building these models would test their performance on all kinds of benchmarks, and tweak them accordingly. This happens all the time in other industries as well. So the pelican riding a bicycle test is interesting, but it's not a performance indicator at this point.