Remix.run Logo
Octoth0rpe 6 hours ago

> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s)

There seems to be at least some detail on that point.