Remix clone Hacker News

new | show | ask | jobs Github

	▲	scottyeager 4 days ago
		Fast inference can change the entire dynamic or working with these tools. At the typical speeds, I usually try to do something else while the model works. When the model works really fast, I can easily wait for it to finish. So the total difference includes the cost of context switching, which is big. Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.