Can perform out of distribution tasks at least around average human level performance.

Every attempt to formally define "general intelligence" for humans has been a shitshow. IQ tests were literally designed to justify excluding immigrants and sterilizing the "feeble-minded." Modern psychometrics can't agree on whether intelligence is one thing (g factor) or many things, whether it's measurable across cultures, or whether the tests measure aptitude or just familiarity with test-taking and middle-class cultural norms.

Now we're trying to define AGI - artificial general intelligence - when we can't even define the G, much less the I. Is it "general" because it works across domains? Okay, how many domains? Is it "general" because it can learn new tasks? How quickly? With how much training data?

The goalposts have already moved a dozen times. GPT-2 couldn't do X, so X was clearly a requirement for AGI. Now models can do X, so actually X was never that important, real AGI needs Y. It's a vibes-based marketing term - like "artificial intelligence" was (per John McCarthy himself) - not a coherent technical definition.

	▲	password54321 20 hours ago \| parent [-]
		I think you are overthinking this. The ARC benchmark for fluid abstracting reasoning was made in 2019 and it still hasn't been 'solved'. So the goalposts aren't moving as much as you think they are. LLMs or neural nets have never been good with out of distribution tasks.