Remix.run Logo
datadrivenangel 2 hours ago

"A great way to go is 2x RTX 3090s for a total of 48GB VRAM total. You can then run Qwen3.6-27B, which is an awesome model."

Just want to note that for $3k you can get an M5 macbook pro with 48gb of shared memory, and it will not be a giant box. Also, consider committing to spending that money on a cloud hosting provider, which will be at least somewhat cheaper if not significantly cheaper. It is awesome being able to run models locally though.

LeBit 2 hours ago | parent | next [-]

I’m an idiot who is unable to project itself in situations I’ve never experienced before.

So, I always thought local LLMs were toys not worth pursuing.

Only once have I tried something decent like Gemma 4 31B and Qwen 3.6 27B did I realize how incredibly useful they are.

You stop fearing you are sharing sensitive information.

You stop fearing you will run out of tokens.

You stop fearing about the availability of the remote AI.

Local LLMs are extremely valuable.

bityard 2 hours ago | parent [-]

*for many tasks

WithinReason 15 minutes ago | parent | prev | next [-]

I'm running Qwen3.6-27B on a single 24GB GPU at 80 tok/s, you don't even need 2 of them

jbellis 2 hours ago | parent | prev | next [-]

That's a reasonable option, just be aware that you get about 1/3 as much memory bandwidth with the M5 Pro, or 2/3 with the M5 Max [now you're at $4100 for the lowest-end]. So both your prefill (flops-bound, M5 has a lot less) and decode (bw-bound) will be slower.

Aurornis 2 hours ago | parent | prev | next [-]

I have an M5 MacBook Pro and I also have a separate GPU setup for running models. The difference in speed is significant. It's not just token generation speed, but time to first token (prompt processing).

The M5 hardware is amazing for what it is, but GPUs are still so much faster.

Running the models on the GPU box also means I can use the laptop on my lap instead of turning it into a hot plate.

amelius an hour ago | parent [-]

What is your GPU setup?

boredatoms 2 hours ago | parent | prev | next [-]

The standalone mini/studio is better if you dont want to have a constantly hot laptop

Get a regular laptop and use the network to access the LLM

amelius an hour ago | parent | prev [-]

You can also buy a Jetson Orin with 64GB of unified memory.