Fascinatingly, as we found out from this HN post Markov chains don't work when scaled up, for technical reasons, so that whole transformers thing is actually necessary for this current generation of AI.
https://news.ycombinator.com/item?id=45958004