Remix.run Logo
oceanplexian 7 days ago

I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.

CamperBob2 7 days ago | parent [-]

Impressive. Is that a distillation, or the real thing?