▲ | dylan604 17 hours ago | ||||||||||||||||||||||
"ah, you hesitated" no more so than on every single other question. the delay for the GPT to process a response is very unnerving. I find it worse than when the news is interviewing a remote site with a delay between responses. maybe if the eyes had LEDs to indicate activity rather than it just sitting there??? waiting for a GPT to do its thing is always going to force a delay especially when pushing the request to the cloud for a response. also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic | |||||||||||||||||||||||
▲ | jszymborski 17 hours ago | parent | next [-] | ||||||||||||||||||||||
I wonder how well suited some of the smaller LLMs like Qwen 0.6B would be suited to this... it doesn't sound like a super complicated task. I also feel like you can train a model on this task by using the zero-shot performance of larger models to create a dataset, making something very zippy. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | accrual 17 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
> also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic This seems like a good place to leverage a wake word library, perhaps openWakeWord or porcupine. Then the user could wake the device before sending the prompt off to an endpoint. It could even have a resting or snoozing animation, then have it perk up when the wake word triggers. Eerie to view, I'm sure... | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | phh 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
Kyutai's unmute has great latency, but requires a fast small-ish, non-thinking, non-tooled LLM. What I'm currently working on is merging both worlds. Take the small LLM for instant response, which will basically just be able to repeat what you said, to show it understood. And have a big LLM do stuff in the background, and feeding back infos to the small LLM to explain intermediary steps. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | justusthane 16 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
> the delay for the GPT to process a response is very unnerving I'm not sure I agree. The way the tentacle stops moving and shoots upright when you start talking to it gives me the intuitive impression that it's paying attention and thinking. Pretty cute! | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | tetha 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
It clearly needs eyebrows like Johnny 5. | |||||||||||||||||||||||
▲ | nebulous1 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
> "ah, you hesitated" no more so than on every single other question. It was longer. I think almost twice as long. Took about 2 seconds to respond generally, 4 seconds for that one. | |||||||||||||||||||||||
▲ | micromacrofoot 14 hours ago | parent | prev [-] | ||||||||||||||||||||||
beyond the prototyping phase, which hosted models make very easy, there's little reason this couldn't use a very small optimized model on device... it would be significantly faster/safer in an end product (but significantly less flexible for prototyping) |