Remix.run Logo
danpalmer 4 days ago

My understanding is that TPUs do not use memory in the same way. GPUs need to do significantly more store/fetch operations from HBM, where TPUs pipeline data through systolic arrays far more. From what I've heard this generally improves latency and also reduces the overhead of supporting large context windows.