Remix.run Logo
jgalt212 2 days ago

Distillation formerly was the key to self-hosted usable models. However, the unceasing pressure to be "agentic", has made self-hosting once again untenable. Agentic tools just hover up too many tokens.

ricardobeat 2 days ago | parent [-]

If they use more tokens isn’t that a case in favor of self-hosting to reduce costs? Or are you saying performance is not good enough for local agents?

regularfry 2 days ago | parent [-]

More tokens in the context means disproportionately more VRAM, to the extent that you really do need multiple GPUs if you're running an interestingly-sized model.