| ▲ | ricardobeat 5 hours ago | |||||||||||||||||||||||||||||||||||||
It's interesting how even 5 tok/s is still much faster than you'd typically type, but feels glacially slow for an agent. On the other hand, I've been using Mimo and Minimax a lot recently. They routinely reach 100-150 tokens per second and that feels too fast, to the point where it's hard to keep up with what it's actually doing. Great for subagents though. | ||||||||||||||||||||||||||||||||||||||
| ▲ | danbruc 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
They routinely reach 100-150 tokens per second and that feels too fast, to the point where it's hard to keep up with what it's actually doing. There is no way you can follow what is going on even at 30 tokens per second. Maybe you can maintain a rough idea of what is going on for some tens of seconds but that is probably about it. Follow it in any detail, no chance. Reason about what you read, absolutely no chance. 800 tok/s — Cerebras-class, where the bottleneck is your eyeballs I do not understand why they say this. I am not sure if it is even true. 800 tokens sounds like a page of text and I would assume you can look at one page per second without hitting any limitation of your eyes. Or is the resolution of the human not good enough to see an entire page at once and you have to scan it with the fovea? Scrolling text might of course hit the temporal resolution limit. But why does this even matter, your brain can not process anything close to the amount of information your eyes can take in. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | metalliqaz 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
I run models in the ~120B class on my old server (96GB DDR4) and it manages about 3-3.5 tok/sec. It is indeed painfully slow to watch, but I find if I walk away or bury the window and do something else, it always seems to be done when I check back | ||||||||||||||||||||||||||||||||||||||