Remix.run Logo
ryukoposting 5 days ago

I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.

throaway920181 5 days ago | parent | next [-]

It's sad that Pis are now so overpriced. They used to be fun little tinker boards that were semi-cheap.

pseudosavant 5 days ago | parent [-]

The Raspberry Pi 2 Zero is as fast as a Pi 3, way smaller, and only costs $13 I think.

The high end Pis aren’t $25 though.

geerlingguy 5 days ago | parent [-]

The Pi 4 is still fine for a lot of low end use cases and starts at $35. The Pi 5 is in a harder position. I think the CM5 and Pi 500 are better showcases for it than the base model.

pseudosavant 3 days ago | parent [-]

Between the microcontrollers, Zero models, the Pi 4, and the Pi 5, they have quite a full-range from very inexpensive and low power to moderate price/performance SBCs.

One of the bigger problems with Pi 5, is that many of the classic Pi use cases don't benefit from more CPU than the Pi 4 had. PCIe is nice, but you might as well go CM5 if you want something like that. The 16GB model would be more interesting if it had the GPU/bandwidth to do AI/tokens at a decent rate, but it doesn't.

I still think using any other brand of SBC is an exercise in futility though. Raspberry Pi products have the community, support, ecosystem behind them that no other SBC can match.

amelius 5 days ago | parent | prev | next [-]

> I'd love to hook my development tools into a fully-local LLM.

Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

So ... using an rpi is probably not what you want.

fexelein 5 days ago | parent | next [-]

I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.

dotancohen 5 days ago | parent [-]

I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks!

fexelein 4 days ago | parent [-]

So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)

My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev

dotancohen 4 days ago | parent [-]

Thank you. I'll take a look at Bropilot when I get set up locally.

Have a great week.

littlestymaar 5 days ago | parent | prev | next [-]

> Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

Interesting because he also said the future is small "cognitive core" models:

> a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.

https://xcancel.com/karpathy/status/1938626382248149433#m

In which case, a raspberry Pi sounds like what you need.

ACCount37 5 days ago | parent [-]

It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.

For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.

Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.

And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.

littlestymaar 4 days ago | parent [-]

Nobody said it's trivial.

refulgentis 5 days ago | parent | prev | next [-]

It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best?

Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on.

[1] gpt-5 finally dethroned sonnet after 7 months

wkat4242 5 days ago | parent [-]

Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either.

littlestymaar 4 days ago | parent [-]

So much this: people acting as if local model were useless when they were in awe about last year proprietary models that were not any better…

MangoToupe 4 days ago | parent | prev | next [-]

I'm kind of shocked so many people are willing to ship their code up to companies that built their products on violating copyright.

dpe82 5 days ago | parent | prev [-]

Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent.

amelius 5 days ago | parent [-]

https://www.youtube.com/watch?v=LCEmiRjPEtQ

dpe82 5 days ago | parent [-]

Ah that one. Thanks!

exitb 5 days ago | parent | prev | next [-]

I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.

rs186 5 days ago | parent | prev | next [-]

$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.

You'll be much better off spending that money on something else more useful.

behnamoh 5 days ago | parent | next [-]

> $500

Yeah, like a Mac Mini or something with better bandwidth.

ekianjo 5 days ago | parent | prev [-]

Raspberry Pis going up in price make them very unattractive since there is a wealth of cheap second used better hardware out there such as NUCs with Celerons

pdntspa 5 days ago | parent | prev | next [-]

Model intelligence should be part of your equation as well, unless you love loads and loads of hidden technical debt and context-eating, unnecessarily complex abstractions

giancarlostoro 5 days ago | parent | next [-]

GPT OSS 20B is smart enough but the context window is tiny with enough files. Wonder if you can make a dumber model with a massive context window thats a middleman to GPT.

pdntspa 5 days ago | parent [-]

Matches my experience.

giancarlostoro 5 days ago | parent [-]

Just have it open a new context window, the other thing I wanted to try is to make a LoRa but im not sure how that works properly, it suggested a whole other model but it wasnt a pleasant experience since it’s not as obvious as diffusion models for images.

th0ma5 5 days ago | parent | prev [-]

How do you evaluate this except for anecdote and how do we know your experience isn't due to how you use them?

pdntspa 5 days ago | parent [-]

You can evaluate it as anecdote. How do I know you have the level of experience necessary to spot these kinds of problems as they arise? How do I know you're not just another AI booster with financial stake poisoning the discussion?

We could go back and forth on this all day.

exe34 5 days ago | parent [-]

you got very defensive. it was a useful question - they were asking in terms of using a local LLM, so at best they might be in the business of selling raspberry pis, not proprietary LLMs.

th0ma5 3 days ago | parent [-]

Yeah to me it more poisonous that people reflexively believe any pushback must be wrong because people feel empowered regardless of any measurement that may point out that people only get (maybe) out of LLM models what they put into them, and even then we can't be sure. That this situation exists and people have been primed with a complete triangulation of all the arguments just simply isn't healthy and we should demand independent measurements instead of the fumbling in the dark of the current model measurements... Or admit that measuring them isn't helpful and like a parent maybe alluded to, can only be described as anecdote and there is no discernable difference between many models.

fastball 5 days ago | parent | prev | next [-]

Capability of the model itself is presumably the more important question than those other two, no?

numpad0 5 days ago | parent | prev | next [-]

MI50 is cheaper

halJordan 5 days ago | parent | prev [-]

This is some sort of joke right?