| ▲ | regularfry 3 hours ago | |
They're claiming 20+tps inference on a macbook with the unsloth quant. | ||
| ▲ | embedding-shape 5 minutes ago | parent [-] | |
Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input. | ||