| ▲ | amelius 5 days ago |
| > I'd love to hook my development tools into a fully-local LLM. Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models. So ... using an rpi is probably not what you want. |
|
| ▲ | fexelein 5 days ago | parent | next [-] |
| I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point. |
| |
| ▲ | dotancohen 5 days ago | parent [-] | | I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks! | | |
| ▲ | fexelein 4 days ago | parent [-] | | So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model) My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev | | |
| ▲ | dotancohen 4 days ago | parent [-] | | Thank you. I'll take a look at Bropilot when I get set up locally. Have a great week. |
|
|
|
|
| ▲ | littlestymaar 5 days ago | parent | prev | next [-] |
| > Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models. Interesting because he also said the future is small "cognitive core" models: > a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. https://xcancel.com/karpathy/status/1938626382248149433#m In which case, a raspberry Pi sounds like what you need. |
| |
| ▲ | ACCount37 5 days ago | parent [-] | | It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet. For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far. Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model. And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run. | | |
|
|
| ▲ | refulgentis 5 days ago | parent | prev | next [-] |
| It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best? Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on. [1] gpt-5 finally dethroned sonnet after 7 months |
| |
| ▲ | wkat4242 5 days ago | parent [-] | | Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either. | | |
| ▲ | littlestymaar 4 days ago | parent [-] | | So much this: people acting as if local model were useless when they were in awe about last year proprietary models that were not any better… |
|
|
|
| ▲ | MangoToupe 4 days ago | parent | prev | next [-] |
| I'm kind of shocked so many people are willing to ship their code up to companies that built their products on violating copyright. |
|
| ▲ | dpe82 5 days ago | parent | prev [-] |
| Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent. |
| |