| ▲ | tkp-415 2 hours ago | |||||||||||||
Can anyone point me in the direction of getting a model to run locally and efficiently inside something like a Docker container on a system with not so strong computing power (aka a Macbook M1 with 8gb of memory)? Is my only option to invest in a system with more computing power? These local models look great, especially something like https://huggingface.co/AlicanKiraz0/Cybersecurity-BaronLLM_O... for assisting in penetration testing. I've experimented with a variety of configurations on my local system, but in the end it turns into a make shift heater. | ||||||||||||||
| ▲ | mft_ an hour ago | parent | next [-] | |||||||||||||
There’s no way around needing a powerful-enough system to run the model. So you either choose a model that can fit on what you have —i.e. via a small model, or a quantised slightly larger model— or you access more powerful hardware, either by buying it or renting it. (IME you don’t need Docker. For an easy start just install LM Studio and have a play.) I picked up a second-hand 64GB M1 Max MacBook Pro a while back for not too much money for such experimentation. It’s sufficiently fast at running any LLM models that it can fit in memory, but the gap between those models and Claude is considerable. However, this might be a path for you? It can also run all manner of diffusion models, but there the performance suffers (vs. an older discrete GPU) and you’re waiting sometimes many minutes for an edit or an image. | ||||||||||||||
| ||||||||||||||
| ▲ | HanClinto 38 minutes ago | parent | prev | next [-] | |||||||||||||
Maybe check out Docker Model Runner -- it's built on llama.cpp (in a good way -- not like Ollama) and handles I think most of what you're looking for? https://www.docker.com/blog/run-llms-locally/ As far as how to find good models to run locally, I found this site recently, and I liked the data it provides: | ||||||||||||||
| ▲ | zozbot234 2 hours ago | parent | prev | next [-] | |||||||||||||
The general rule of thumb is that you should feel free to quantize even as low as 2 bits average if this helps you run a model with more active parameters. Quantized models are not perfect at all, but they're preferable to the models with fewer, bigger parameters. With 8GB usable, you could run models with up to 32B active at heavy quantization. | ||||||||||||||
| ▲ | xrd 2 hours ago | parent | prev [-] | |||||||||||||
I think a better bet is to ask on reddit. https://www.reddit.com/r/LocalLLM/ Everytime I ask the same thing here, people point me there. | ||||||||||||||