Remix clone Hacker News

new | show | ask | jobs Github

	▲	mips_avatar 4 days ago
		I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.