| ▲ | whimsicalism 3 days ago | |||||||
Makes sense, I think streaming audio->audio inference is a relatively big lift. | ||||||||
| ▲ | red2awn 2 days ago | parent [-] | |||||||
Correct, it's breaks the single prompt, single completion assumption baked into the frameworks. Conceptually it's still prompt/completion but for low latency response you have to do streaming KV cache prefill with a websocket server. | ||||||||
| ||||||||