Remix.run Logo
Eonexus 2 hours ago

I wonder what the tokens per second actually are. Yes, it does say "reading speed" but that varies for everyone, no?

cafkafk 2 hours ago | parent [-]

That is a very fair point! I just ran a not very scientific benchmark with the system under load, and posted the raw logs in a sibling comment above, but the short answer is that it's hitting 11.94 tokens per second for generation - while it's also being a binary cache and CI build server.

Totally just vibes based, I think it goes up to 20+ tps when it's not under load (and that's me trying to be conservative). For context, reading speed at 250 wpm would be around 5 to 6 tokens per second.

Eonexus 2 hours ago | parent [-]

Huh, that's actually not bad at all! Sure, it's not at the speed of a GPU, but still, 20 tps is cromulent for a CPU.