Remix.run Logo
woadwarrior01 2 hours ago

GPUs are a near monopoly. There are at least handful of big players in the CPU space. Competition alone makes the latter space a lot cheaper.

Also, for inference (and not training) there are other ways to efficiently do matmuls besides the GPU. You might want to look up Apple's undocumented AMX CPU ISA, and also this thing that vendors call the "Neural Engine" in their marketing (capabilities and the term's specific meaning varies broadly from vendor to vendor).

For small 1-3B parameter transformers like TADA, both these options are much more energy efficient, compared to GPU inference.