▲ | oceanplexian 7 days ago | |
I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant. | ||
▲ | CamperBob2 7 days ago | parent [-] | |
Impressive. Is that a distillation, or the real thing? |