Is anyone working on software that lets you run local LLMs in the browser?

In theory, it should be possible, shouldn't it?

The page could hold only the software in JavaScript that uses WebGL to run the neural net. And offer an "upload" button that the user can click to select a model from their file system. The button would not upload the model to a server - it would just let the JS code access it to convert it into WebGL and move it into the GPU.

This way, one could download models from HuggingFace, store them locally and use them as needed. Nicely sandboxed and independent of the operating system.

▲ simonw 4 days ago | parent | next [-]

Transformers.js (https://huggingface.co/docs/transformers.js/en/index) is this. Some demos (should work in Chrome and Firefox on Windows, or Firefox Nightly on macOS and Linux):

https://huggingface.co/spaces/webml-community/llama-3.2-webg... loads a 1.24GB Llama 3.2 q4f16 ONNX build

https://huggingface.co/spaces/webml-community/janus-pro-webg... loads a 2.24 GB DeepSeek Janus Pro model which is multi-modal for output - it can respond with generated images in addition to text.

https://huggingface.co/blog/embeddinggemma#transformersjs loads 400MB for an EmbeddingGemma demo (embeddings, not LLMs)

I've collected a few more of these demos here: https://simonwillison.net/tags/transformers-js/

You can also get this working with web-llm - https://github.com/mlc-ai/web-llm - here's my write-up of a demo that uses that: https://simonwillison.net/2024/Nov/29/structured-generation-...

▲ mg 4 days ago | parent [-]

This might be a misunderstanding. Did you see the "button that the user can click to select a model from their file system" part of my comment?

I tried some of the demos of transformers.js but they all seem to load the model from a server. Which is super slow. I would like to have a page the lets me use any model I have on my disk.

▲ simonw 4 days ago | parent [-]

Oh sorry, I missed that bit.

I got Codex + GPT-5 to modify that Llama chat example to implement the "load from local directory" pattern. It appears to work.

First you'll need to grab the checkout of the local model (~1.3GB):

  git lfs install
  git clone https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct-q4f16

Then visit this page: https://static.simonwillison.net/static/2025/llama-3.2-webgp... - in Chrome or Firefox Nightly.

Now click "Browse folder" and select the folder you just checked out with Git.

Click the confusing "Upload" confirmation (it doesn't upload anything, just opens those files in the current browser session).

Now click "Load local model" - and you should get a full working chat interface.

Code is here: https://github.com/simonw/transformers.js-examples/commit/cd...

Here's the full Codex session that I used to build this: https://gist.github.com/simonw/3c46c9e609f6ee77367a760b5ca01...

I ran Codex against the https://github.com/huggingface/transformers.js-examples/tree... folder and prompted:

> Modify this application such that it offers the user a file browse button for selecting their own local copy of the model file instead of loading it over the network. Provide a "download model" option too.

Then later:

> Build the production app and then make it available on localhost somehow

And:

> Uncaught (in promise) Error: Invalid configuration detected: both local and remote models are disabled. Fix by setting `env.allowLocalModels` or `env.allowRemoteModels` to `true`.

And:

> Add a bash script which will build the application such that I can upload a folder called llama-3.2-webgpu to http://static.simonwillison.net/static/2025/llama-3.2-webgpu... and http://static.simonwillison.net/static/2025/llama-3.2-webgpu... will serve the app

(Note that this doesn't allow you to use any model on your machine, but it proves that it's possible.)

	▲	simonw 4 days ago \| parent \| next [-]
		Wrote this all up on my blog here, including a GIF demo showing how to use it: https://simonwillison.net/2025/Sep/8/webgpu-local-folder/
	▲	mg 4 days ago \| parent \| prev [-]
		Awesome! Bookmarked. I will surely try it out once FireFox or Chromium on Linux support WebGPU in their default config.

▲ SparkyMcUnicorn 4 days ago | parent | prev | next [-]

Yes. MLC's inference engine runs on WebGPU/WASM.

https://github.com/mlc-ai/web-llm-chat

https://github.com/mlc-ai/mlc-llm

https://github.com/mlc-ai/web-llm

▲

mg 4 days ago | parent [-]

Yeah, something like that, but without the WebGPU requirement.

Neither FireFox nor Chromium support WebGPU on Linux. Maybe behind flags. But before using a technology, I would wait until it is available in the default config.

Lets see when browsers will bring WebGPU to Linux.

	▲	SparkyMcUnicorn 4 days ago \| parent \| next [-]
		This should be what you're looking for. It doesn't utilize the GPU, but WebGL support is in the TODOs. https://github.com/ngxson/wllama https://huggingface.co/spaces/ngxson/wllama
	▲	simonw 4 days ago \| parent \| prev [-]
		Firefox Nightly on macOS now supports WebGPU, and the documentation says the Linux build supports it too.

▲ generalizations 4 days ago | parent | prev | next [-]

This is an in-browser llamacpp implementation: https://github.com/ngxson/wllama

And related is the whisper implementation: https://ggml.ai/whisper.cpp/

▲ vonneumannstan 4 days ago | parent | prev | next [-]

This one is pretty cool. Compile the gguf of an OSS LLM directly into an executable. Will open an interface in the browser to chat. Can also launch an OpenAI API style interface hosted locally.

Doesn't work quite as well on Windows due to the executable file size limit but seems great for Mac/Linux flavors.

https://github.com/Mozilla-Ocho/llamafile

▲ adastra22 4 days ago | parent | prev | next [-]

You don’t need a browser to sandbox something. Easier and more performant to do GOU pass through to a container or VM.

▲

01HNNWZ0MV43FF 4 days ago | parent [-]

Container or VM is a bigger commitment. VMs need root and containers need Docker group and something like docker-compose or a shell script or something.

idk it's just like, do I want to run to the store and buy a 24-pack of water bottles, and stash them somewhere, or do I want to open the tap and have clean drinking water

	▲	adastra22 3 days ago \| parent [-]
		Neither of requirements are true on recent OS versions. Users have had the ability to make containers or VMs without special privileges for a very long time now.

▲ paulirish 4 days ago | parent | prev | next [-]

Beyond all the wasm/webgpu approaches other folks have linked (mostly in the transformers.js ecosystem), there's been a standardized API brewing since 2019: https://webmachinelearning.github.io/webnn-intro/

Demos here: https://webmachinelearning.github.io/webnn-samples/ I'm not sure any of them allow you to select a model file from disk, but that should be entirely straightforward.

▲ samsolomon 4 days ago | parent | prev | next [-]

Is Open WebUI something like you are looking for? The design has some awkwardness, but overall it's incorporated a ton of great features.

https://openwebui.com/

▲ mg 4 days ago | parent [-]

No, I'm looking for an html page with a button "Select LLM". After pressing that button and selecting a local LLM from disk, it would show an input field where you can type your question and then it would use the given LLM to create the answer.

I'm not sure what OpenWebUI is, but if it was what I mean, they would surely have the page live and not ask users to install Docker etc.

▲ tmdetect 4 days ago | parent | next [-]

I think what you want is this: https://github.com/mlc-ai/web-llm

▲ bravetraveler 4 days ago | parent | prev | next [-]

It's both what you want and not; the chat/question interface is as you describe, lack-of-installation is not. The LLM work is offloaded to other software, not the browser.

I would like to skip maintaining all this crap, though: I like your approach

▲ Jemaclus 4 days ago | parent | prev [-]

You should install it, because it's exactly what you just described.

Edit: From a UI perspective, it's exactly what you described. There's a dropdown where you select the LLM, and there's a ChatGPT-style chatbox. You just docker-up and go to town.

Maybe I don't understand the rest of the request, but I can't imagine a software where a webpage exists and it just magically has LLMs available in the browser with no installation?

▲ craftkiller 4 days ago | parent | next [-]

It doesn't seem exactly like what they are describing. The end-user interface is what they are describing but it sounds like they want the actual LLM to run in the browser (perhaps via webgpu compute shaders). Open WebUI seems to rely on some external executor like ollama/llama.cpp, which naturally can still be self-hosted but they are not executing INSIDE the browser.

▲ Jemaclus 4 days ago | parent [-]

Does that even exist? It's basically what they described but with some additional installation? Once you install it, you can select the LLM on disk and run it? That's what they asked for.

Maybe I'm misunderstanding something.

▲ craftkiller 4 days ago | parent [-]

Apparently it does, though I'm learning about it for the first time in this thread also. Personally, I just run llama.cpp locally in docker-compose with anythingllm for the UI but I can see the appeal of having it all just run in the browser.

  https://github.com/mlc-ai/web-llm
  https://github.com/ngxson/wllama

	▲	Jemaclus 4 days ago \| parent [-]
		Oh, interesting. Well, TIL.

▲ andsoitis 4 days ago | parent | prev [-]

> You should install it, because it's exactly what you just described.

Not OP, but it really isn't what' they're looking for. Needing to install stuff VS simply going to a web page are two very different things.

▲ coip 4 days ago | parent | prev | next [-]

Have you seen/used the webGPU spaces?

https://huggingface.co/docs/transformers.js/en/guides/webgpu

eta: its predecessor was using webGL

	▲	mg 4 days ago \| parent [-]
		WebGPU is not yet available in the default config of Linux browsers, so WebGL would have been perfect :)

▲ mudkipdev 4 days ago | parent | prev | next [-]

It was done with gemma-3-270m, I hope someone will post a link to it below

▲ vavikk 4 days ago | parent | prev [-]

Not browser but Electron. For the browser you would have to run a local nodejs server and point the browser app to use the local API. I use electron with nodejs and react for UI. Yes I can switch models.