Remix clone Hacker News

new | show | ask | jobs Github

	▲	avaer 10 hours ago
		There's also the Prompt API, currently in Origin Trial, which supports this api surface for sites: https://developer.chrome.com/docs/ai/prompt-api I just checked the stats: `Model Name: v3Nano Version: 2025.06.30.1229 Backend Type: GPU (highest quality) Folder size: 4,072.13 MiB` Different use case but a similar approach. I expect that at some point this will become a native web feature, but not anytime soon, since the model download is many multiples the size of the browser itself. Maybe at some point these APIs could use LLMs built into the OS, like we do for graphics drivers.
	▲	michaelbuckbee 29 minutes ago \| parent \| next [-]
		FWIW - I did a real world experiment pitting the built in Gemini Nano vs a free equivalent from OpenRouter (server call) and the free+server side was better in literally every performance metric. That's not to say that the in browser isn't valuable for privacy+offline, just that the standard case currently is pretty rough. https://sendcheckit.com/blog/ai-powered-subject-line-alterna...
	▲	veunes 6 hours ago \| parent \| prev \| next [-]
		That’s exactly where we’re headed. Architecturally it makes zero sense to spin up an LLM in every app's userspace. Since we have dedicated NPUs and GPUs now, we need a unified system-level orchestrator to balance inference queues across different programs - exactly how the OS handles access to the NIC or the audio stack. The browser should just be making an IPC call to the system instead of hauling its own heavy inference engine along for the ride
	▲	sheept 6 hours ago \| parent \| prev \| next [-]
		The Summarizer API is already shipped, and any website can use it to quietly trigger a 2 GB download by simply calling `Summarizer.create()` (requires user activation)
	▲	oyebenny 7 hours ago \| parent \| prev [-]
		Interesting!