Remix.run Logo
phh 2 days ago

Kyutai's unmute has great latency, but requires a fast small-ish, non-thinking, non-tooled LLM. What I'm currently working on is merging both worlds. Take the small LLM for instant response, which will basically just be able to repeat what you said, to show it understood. And have a big LLM do stuff in the background, and feeding back infos to the small LLM to explain intermediary steps.

endymion-light 2 days ago | parent [-]

This is the key aspect for future development of models - small instant reasoning, ideally on device that funnels through tho a larger model for reasoning.