▲ | ricardobeat 2 days ago | |
If they use more tokens isn’t that a case in favor of self-hosting to reduce costs? Or are you saying performance is not good enough for local agents? | ||
▲ | regularfry 2 days ago | parent [-] | |
More tokens in the context means disproportionately more VRAM, to the extent that you really do need multiple GPUs if you're running an interestingly-sized model. |