Remix.run Logo
simonw 2 hours ago

Qwen still have the best models that actually run on a laptop - Gemma 4 is their best competition there.

zozbot234 24 minutes ago | parent [-]

That's only really true if one ignores the possibility of SSD offloading, which effectively opens up inference with far larger models. It's possible that the combination of batched inference and SSD streaming may be even more effective, though only for selected models with especially efficient KV storage, or perhaps very small inference contexts.