Remix.run Logo
rib3ye 9 hours ago

How many tokens /sec?

roadside_picnic 8 hours ago | parent [-]

M3-Max laptop: ~55 token/sec

RTX 4090: ~190 token/sec

I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.

The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.