Remix.run Logo
veunes 5 hours ago

Sure, 26B models on beefy desktop silicon are finally nipping at the heels of commercial APIs, but this is a mobile thread. On a phone with 8GB of RAM and passive cooling, your tokens per second (t/s) are going to fall off a cliff after the first minute of sustained compute