Remix.run Logo
behnamoh 5 days ago

Everything runs on a π if you quantize it enough!

I'm curious about the applications though. Do people randomly buy 4xRPi5s that they can now dedicate to running LLMs?

ryukoposting 5 days ago | parent | next [-]

I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.

throaway920181 5 days ago | parent | next [-]

It's sad that Pis are now so overpriced. They used to be fun little tinker boards that were semi-cheap.

pseudosavant 5 days ago | parent [-]

The Raspberry Pi 2 Zero is as fast as a Pi 3, way smaller, and only costs $13 I think.

The high end Pis aren’t $25 though.

geerlingguy 5 days ago | parent [-]

The Pi 4 is still fine for a lot of low end use cases and starts at $35. The Pi 5 is in a harder position. I think the CM5 and Pi 500 are better showcases for it than the base model.

pseudosavant 3 days ago | parent [-]

Between the microcontrollers, Zero models, the Pi 4, and the Pi 5, they have quite a full-range from very inexpensive and low power to moderate price/performance SBCs.

One of the bigger problems with Pi 5, is that many of the classic Pi use cases don't benefit from more CPU than the Pi 4 had. PCIe is nice, but you might as well go CM5 if you want something like that. The 16GB model would be more interesting if it had the GPU/bandwidth to do AI/tokens at a decent rate, but it doesn't.

I still think using any other brand of SBC is an exercise in futility though. Raspberry Pi products have the community, support, ecosystem behind them that no other SBC can match.

amelius 5 days ago | parent | prev | next [-]

> I'd love to hook my development tools into a fully-local LLM.

Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

So ... using an rpi is probably not what you want.

fexelein 5 days ago | parent | next [-]

I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.

dotancohen 5 days ago | parent [-]

I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks!

fexelein 4 days ago | parent [-]

So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)

My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev

dotancohen 4 days ago | parent [-]

Thank you. I'll take a look at Bropilot when I get set up locally.

Have a great week.

littlestymaar 5 days ago | parent | prev | next [-]

> Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

Interesting because he also said the future is small "cognitive core" models:

> a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.

https://xcancel.com/karpathy/status/1938626382248149433#m

In which case, a raspberry Pi sounds like what you need.

ACCount37 5 days ago | parent [-]

It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.

For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.

Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.

And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.

littlestymaar 4 days ago | parent [-]

Nobody said it's trivial.

refulgentis 5 days ago | parent | prev | next [-]

It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best?

Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on.

[1] gpt-5 finally dethroned sonnet after 7 months

wkat4242 5 days ago | parent [-]

Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either.

littlestymaar 4 days ago | parent [-]

So much this: people acting as if local model were useless when they were in awe about last year proprietary models that were not any better…

MangoToupe 4 days ago | parent | prev | next [-]

I'm kind of shocked so many people are willing to ship their code up to companies that built their products on violating copyright.

dpe82 5 days ago | parent | prev [-]

Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent.

amelius 5 days ago | parent [-]

https://www.youtube.com/watch?v=LCEmiRjPEtQ

dpe82 5 days ago | parent [-]

Ah that one. Thanks!

exitb 5 days ago | parent | prev | next [-]

I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.

rs186 5 days ago | parent | prev | next [-]

$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.

You'll be much better off spending that money on something else more useful.

behnamoh 5 days ago | parent | next [-]

> $500

Yeah, like a Mac Mini or something with better bandwidth.

ekianjo 5 days ago | parent | prev [-]

Raspberry Pis going up in price make them very unattractive since there is a wealth of cheap second used better hardware out there such as NUCs with Celerons

pdntspa 5 days ago | parent | prev | next [-]

Model intelligence should be part of your equation as well, unless you love loads and loads of hidden technical debt and context-eating, unnecessarily complex abstractions

giancarlostoro 5 days ago | parent | next [-]

GPT OSS 20B is smart enough but the context window is tiny with enough files. Wonder if you can make a dumber model with a massive context window thats a middleman to GPT.

pdntspa 5 days ago | parent [-]

Matches my experience.

giancarlostoro 5 days ago | parent [-]

Just have it open a new context window, the other thing I wanted to try is to make a LoRa but im not sure how that works properly, it suggested a whole other model but it wasnt a pleasant experience since it’s not as obvious as diffusion models for images.

th0ma5 5 days ago | parent | prev [-]

How do you evaluate this except for anecdote and how do we know your experience isn't due to how you use them?

pdntspa 5 days ago | parent [-]

You can evaluate it as anecdote. How do I know you have the level of experience necessary to spot these kinds of problems as they arise? How do I know you're not just another AI booster with financial stake poisoning the discussion?

We could go back and forth on this all day.

exe34 5 days ago | parent [-]

you got very defensive. it was a useful question - they were asking in terms of using a local LLM, so at best they might be in the business of selling raspberry pis, not proprietary LLMs.

th0ma5 3 days ago | parent [-]

Yeah to me it more poisonous that people reflexively believe any pushback must be wrong because people feel empowered regardless of any measurement that may point out that people only get (maybe) out of LLM models what they put into them, and even then we can't be sure. That this situation exists and people have been primed with a complete triangulation of all the arguments just simply isn't healthy and we should demand independent measurements instead of the fumbling in the dark of the current model measurements... Or admit that measuring them isn't helpful and like a parent maybe alluded to, can only be described as anecdote and there is no discernable difference between many models.

fastball 5 days ago | parent | prev | next [-]

Capability of the model itself is presumably the more important question than those other two, no?

numpad0 5 days ago | parent | prev | next [-]

MI50 is cheaper

halJordan 5 days ago | parent | prev [-]

This is some sort of joke right?

giancarlostoro 5 days ago | parent | prev | next [-]

Sometimes you buy a pi for one project start on it buy another for a different project, before you know it none are complete and you have ten Raspberry Pis lying around across various generations. ;)

dotancohen 5 days ago | parent [-]

Arduino hobbist, same issue.

Though I must admit to first noticing the trend decades before discovering Arduino when I looked at the stack of 289, 302, and 351W intake manifolds on my shelf and realised that I need the width of the 351W manifold but the fuel injection of the 302. Some things just never change.

giancarlostoro 5 days ago | parent [-]

I have different model Raspberry Pi's and I'm having a hard time justifying buying a 5... but if I can run LLMs off one or two... I just might. I guess what the next Raspberry Pi needs is a genuinely impressive GPU that COULD run small AI models, so people will start cracking at it.

hhh 5 days ago | parent | prev | next [-]

I have clusters of over a thousand raspberry pi’s that have generally 75% of their compute and 80% of their memory that is completely unused.

Moto7451 5 days ago | parent | next [-]

That’s an interesting setup. What are you doing with that sort of cluster?

estimator7292 5 days ago | parent [-]

99.9% of enthusiast/hobbyist clusters like this are exclusively used for blinkenlights

wkat4242 5 days ago | parent [-]

Blinkenlights are an admirable pursuit

estimator7292 5 days ago | parent [-]

That wasn't a judgement! I filled my homelab rack server with mechanical drives so I can get clicky noises along with the blinky lights

CamperBob2 5 days ago | parent | prev | next [-]

Good ol' Amdahl in action.

fragmede 5 days ago | parent | prev | next [-]

That sounds awesome, do you have any pictures?

larodi 5 days ago | parent | prev [-]

Is it solar powered?

Zenst 5 days ago | parent | prev | next [-]

Depends on the model - if you have a sparse model with MoE, then you can divide it up into smaller nodes, your dense 30b models, I do not see them flying anytime soon.

Intel pro B50 in a dumpster PC would do you well better at this model (not enough ram for dense 30b alas) and get close to 20 tokens a second and so much cheaper.

ugh123 5 days ago | parent | prev | next [-]

I think it serves a good test bed to test methods and models. We'll see if someday they can reduce it to 3... 2... 1 Pi5's that can match performance.

blululu 5 days ago | parent | prev | next [-]

For $500 you may as well spend an extra $100 and get a Mac mini with an m4 chip and 256gb of ram and avoid the headaches of coordinating 4 machines.

MangoToupe 4 days ago | parent [-]

I don't think you can get 256 gigs of ram in a mac mini for $600. I do endorse the mac as an AI workbench tho

piecerough 5 days ago | parent | prev | next [-]

"quantize enough"

though at what quality?

dotancohen 5 days ago | parent [-]

Quantity has a quality all its own.

6r17 5 days ago | parent | prev [-]

I mean at this point it's more of a "proof-of-work" with shared BP ; I would deff see some domotic hacker get this running - hell maybe i'll do this do if I have some spare time and want to make something like alexa with customized stuff - would still need text to speech and speech to text but that's not really the topic of his set-up ; even for pro use if that's really usable why not just spawn qwen on ARM if that's cheaper - there is a lot of way to read and leverage such bench