Remix.run Logo
refulgentis 2 days ago

I guess I'd say, why is the framework perceived as GPU poor? I don't have one but I also don't know why TTFT would be significantly lower than M-series (it's a good GPU!)

CamperBob2 2 days ago | parent [-]

Compared to 4x RTX 6000 Blackwell boards, it's GPU poor. There has to be a reason they want to load up a tower chassis with $35K worth of GPUs, right? I'd have to assume it has strong advantages for inference as well as training, given that the GPU has more influence on TTFT with longer contexts than the CPU does.

refulgentis 2 days ago | parent [-]

Right - I'd suggest the idea that 128 GB of GPU RAM gives you an 8K context shows us it may be worth revising priors such as "it has strong advantages for inference as well as training"

As Mr. Hildebrand used to say, when you assume, you make...

(also note the article specifically frames this speccing out as about training :) not just me suggesting it)