| ▲ | cyanydeez 10 hours ago | |
keep in mind, efficient KV caching needs to be next to the GPU, so you sls need you HA to keep routing the user to the same hardware. the hardware VM model is almost identical. Each session can go anywhere to start but a live session cant just be routed anywhere without penalty. | ||