| ▲ | armcat 3 hours ago | |||||||
This is an outstanding write up, thank you! Regarding LLM latency, OpenAI introduced web sockets in their Responses client recently so it should be a bit faster. An alternative is to have a super small LLM running locally on your device. I built my own pipeline fully local and it was sub second RTT, with no streaming nor optimisations https://github.com/acatovic/ova | ||||||||
| ▲ | nicktikhonov 3 hours ago | parent [-] | |||||||
Very cool! starred and on my reading list. Would love to chat and share notes, if you'd like | ||||||||
| ||||||||