Because they don't yet know how to "just stop emitting so much hot air" without also removing their ability to do anything like "thinking" (or whatever you want to call the transcript mode), which is hard because knowing which tokens are hot air is the hard problem itself.

They basically only started doing this because someone noticed you got better performance from the early models by straight up writing "think step by step" in your prompt.

▲

Terr_ 2 hours ago | parent [-]

IMO it supports the framing that it's all just a "make document longer" problem, where our human brains are primed for a kind of illusion, where we perceive/infer a mind because, traditionally, that's been the only thing that makes such fitting language.

	▲	ben_w 2 hours ago \| parent [-]
		To an extent. Even though they're clearly improving, they also definitely look better than they actually are. this time last year they couldn't write compilable source code for a compiler for a toy language, I know because I tried