Remix.run Logo
singhrac 3 days ago

I think we might just disagree about how much of the GPU spend is on small vs large model (inference or training). I think it’s something like 99.9% of spending interest is on models that don’t fit into 128 GB (remember KV cache matters too). Happy to be proven wrong!