| ▲ | mg 4 days ago |
| Is anyone working on software that lets you run local LLMs in the browser? In theory, it should be possible, shouldn't it? The page could hold only the software in JavaScript that uses WebGL to run the neural net. And offer an "upload" button that the user can click to select a model from their file system. The button would not upload the model to a server - it would just let the JS code access it to convert it into WebGL and move it into the GPU. This way, one could download models from HuggingFace, store them locally and use them as needed. Nicely sandboxed and independent of the operating system. |
|
| ▲ | simonw 4 days ago | parent | next [-] |
| Transformers.js (https://huggingface.co/docs/transformers.js/en/index) is this. Some demos (should work in Chrome and Firefox on Windows, or Firefox Nightly on macOS and Linux): https://huggingface.co/spaces/webml-community/llama-3.2-webg... loads a 1.24GB Llama 3.2 q4f16 ONNX build https://huggingface.co/spaces/webml-community/janus-pro-webg... loads a 2.24 GB DeepSeek Janus Pro model which is multi-modal for output - it can respond with generated images in addition to text. https://huggingface.co/blog/embeddinggemma#transformersjs loads 400MB for an EmbeddingGemma demo (embeddings, not LLMs) I've collected a few more of these demos here: https://simonwillison.net/tags/transformers-js/ You can also get this working with web-llm - https://github.com/mlc-ai/web-llm - here's my write-up of a demo that uses that: https://simonwillison.net/2024/Nov/29/structured-generation-... |
| |
| ▲ | mg 4 days ago | parent [-] | | This might be a misunderstanding. Did you see the "button that the user can click to select a model from their file system" part of my comment? I tried some of the demos of transformers.js but they all seem to load the model from a server. Which is super slow. I would like to have a page the lets me use any model I have on my disk. | | |
|
|
| ▲ | SparkyMcUnicorn 4 days ago | parent | prev | next [-] |
| Yes. MLC's inference engine runs on WebGPU/WASM. https://github.com/mlc-ai/web-llm-chat https://github.com/mlc-ai/mlc-llm https://github.com/mlc-ai/web-llm |
| |
| ▲ | mg 4 days ago | parent [-] | | Yeah, something like that, but without the WebGPU requirement. Neither FireFox nor Chromium support WebGPU on Linux. Maybe behind flags. But before using a technology, I would wait until it is available in the default config. Lets see when browsers will bring WebGPU to Linux. | | |
|
|
| ▲ | generalizations 4 days ago | parent | prev | next [-] |
| This is an in-browser llamacpp implementation: https://github.com/ngxson/wllama And related is the whisper implementation: https://ggml.ai/whisper.cpp/ |
|
| ▲ | vonneumannstan 4 days ago | parent | prev | next [-] |
| This one is pretty cool. Compile the gguf of an OSS LLM directly into an executable. Will open an interface in the browser to chat. Can also launch an OpenAI API style interface hosted locally. Doesn't work quite as well on Windows due to the executable file size limit but seems great for Mac/Linux flavors. https://github.com/Mozilla-Ocho/llamafile |
|
| ▲ | adastra22 4 days ago | parent | prev | next [-] |
| You don’t need a browser to sandbox something. Easier and more performant to do GOU pass through to a container or VM. |
| |
| ▲ | 01HNNWZ0MV43FF 4 days ago | parent [-] | | Container or VM is a bigger commitment. VMs need root and containers need Docker group and something like docker-compose or a shell script or something. idk it's just like, do I want to run to the store and buy a 24-pack of water bottles, and stash them somewhere, or do I want to open the tap and have clean drinking water | | |
| ▲ | adastra22 3 days ago | parent [-] | | Neither of requirements are true on recent OS versions. Users have had the ability to make containers or VMs without special privileges for a very long time now. |
|
|
|
| ▲ | paulirish 4 days ago | parent | prev | next [-] |
| Beyond all the wasm/webgpu approaches other folks have linked (mostly in the transformers.js ecosystem), there's been a standardized API brewing since 2019: https://webmachinelearning.github.io/webnn-intro/ Demos here: https://webmachinelearning.github.io/webnn-samples/ I'm not sure any of them allow you to select a model file from disk, but that should be entirely straightforward. |
|
| ▲ | samsolomon 4 days ago | parent | prev | next [-] |
| Is Open WebUI something like you are looking for? The design has some awkwardness, but overall it's incorporated a ton of great features. https://openwebui.com/ |
| |
| ▲ | mg 4 days ago | parent [-] | | No, I'm looking for an html page with a button "Select LLM". After pressing that button and selecting a local LLM from disk, it would show an input field where you can type your question and then it would use the given LLM to create the answer. I'm not sure what OpenWebUI is, but if it was what I mean, they would surely have the page live and not ask users to install Docker etc. | | |
| ▲ | tmdetect 4 days ago | parent | next [-] | | I think what you want is this: https://github.com/mlc-ai/web-llm | |
| ▲ | bravetraveler 4 days ago | parent | prev | next [-] | | It's both what you want and not; the chat/question interface is as you describe, lack-of-installation is not. The LLM work is offloaded to other software, not the browser. I would like to skip maintaining all this crap, though: I like your approach | |
| ▲ | Jemaclus 4 days ago | parent | prev [-] | | You should install it, because it's exactly what you just described. Edit: From a UI perspective, it's exactly what you described. There's a dropdown where you select the LLM, and there's a ChatGPT-style chatbox. You just docker-up and go to town. Maybe I don't understand the rest of the request, but I can't imagine a software where a webpage exists and it just magically has LLMs available in the browser with no installation? | | |
| ▲ | craftkiller 4 days ago | parent | next [-] | | It doesn't seem exactly like what they are describing. The end-user interface is what they are describing but it sounds like they want the actual LLM to run in the browser (perhaps via webgpu compute shaders). Open WebUI seems to rely on some external executor like ollama/llama.cpp, which naturally can still be self-hosted but they are not executing INSIDE the browser. | | |
| ▲ | Jemaclus 4 days ago | parent [-] | | Does that even exist? It's basically what they described but with some additional installation? Once you install it, you can select the LLM on disk and run it? That's what they asked for. Maybe I'm misunderstanding something. | | |
| ▲ | craftkiller 4 days ago | parent [-] | | Apparently it does, though I'm learning about it for the first time in this thread also. Personally, I just run llama.cpp locally in docker-compose with anythingllm for the UI but I can see the appeal of having it all just run in the browser. https://github.com/mlc-ai/web-llm
https://github.com/ngxson/wllama
| | |
|
| |
| ▲ | andsoitis 4 days ago | parent | prev [-] | | > You should install it, because it's exactly what you just described. Not OP, but it really isn't what' they're looking for.
Needing to install stuff VS simply going to a web page are two very different things. |
|
|
|
|
| ▲ | coip 4 days ago | parent | prev | next [-] |
| Have you seen/used the webGPU spaces? https://huggingface.co/docs/transformers.js/en/guides/webgpu eta: its predecessor was using webGL |
| |
| ▲ | mg 4 days ago | parent [-] | | WebGPU is not yet available in the default config of Linux browsers, so WebGL would have been perfect :) |
|
|
| ▲ | mudkipdev 4 days ago | parent | prev | next [-] |
| It was done with gemma-3-270m, I hope someone will post a link to it below |
|
| ▲ | vavikk 4 days ago | parent | prev [-] |
| Not browser but Electron. For the browser you would have to run a local nodejs server and point the browser app to use the local API. I use electron with nodejs and react for UI. Yes I can switch models. |