What’s the price point for getting into that sweet spot?

I’m on an M1 Max with 32GB VRAM, so I’m looking forward to the 27B or 35B-A3B models. Is dropping $5k for an RTX 6000 or a DGX Spark really the best option?

▲

anonym29 a month ago | parent | next [-]

Strix Halo at $2k with similar TG and about half the PP of DGX Spark was a pretty good deal IMO, especially considering it's also a full x86 system... 16c/32t Zen 5, 40 CU RDNA 3.5, 128 GB unified memory at ~220 GB/s real-world speeds (256 GB/s theoretical) - that runs full tilt at 140W in performance mode and idles at ~10W.

Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.

Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.

Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.

I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have up to 26c Zen 6 (heterogeneous cores - kinds funny that AMD is heading towards these as Intel retreats from them), 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.

Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.

▲

tempoponet a month ago | parent | prev | next [-]

Expect to pay $4k-10k

- Your RTX 6000 is closer to $10k now

- Sparks are creeping into the $4-5k range

- AMD Strix are ~3.5k

- Apple depends on chipset and memory. Sweet spot would be 128gb M3 Ultra, probably $6-8k but admittedly haven't been tracking closely. New M5 might come in the fall. You can get a new 128gb M5 Max laptop for ~5-6k today.

- a 4x3090 rig would take $5-6k

Every platform has tradeoffs, but it's mostly ecosystem, memory bandwidth, and power consumption. They're all slow. The best option is likely to rent hardware on Runpod. The RIO on self-hosting is very low unless you have a specific need or you're ok treating it as a hobby.

▲

anonym29 a month ago | parent | next [-]

Bosgame M5 (Strix Halo) w/ 128 GB still goes for $2800 right now. SH systems have surged in price dramatically but quite unevenly.

>The best option is likely to rent hardware on Runpod.

Vast.ai is much cheaper, but the broader point here is contestable. The only dimension in which cloud GPU rentals win is cost. You lose the confidentiality, integrity, and availability benefits of local deployments.

	▲	ai_fry_ur_brain a month ago \| parent [-]
		Rentals are priced to pay themselves off in 1-1.5 years (when renting them out per hour, not selling tokens). Its never a better option to rent. Not that I'd encourage anyone to throw large amounts of money to have access to LLMs, but you're definately going to be better off buying something that you can amortize over multiple years with a multi year warranty.

▲

bahmboo a month ago | parent | prev | next [-]

$2600 gets MBP M5 Pro 48gb. 64gb requires a Max which bumps it to $4200 at which point you may as well spend the $800 to go to 128gb.

▲

ai_fry_ur_brain a month ago | parent | prev [-]

And for what? Spend 10-15k for the slopiest of slop code, non deterministic automations, and the ability to spawn an AI gf?

This whole thing is really starting to remind me of the crypto hype phases of 2016-2018 when everyone thought their investment in GPUs was going to make them rich.

▲

dvfjsdhgfv a month ago | parent | next [-]

I upvoted your comment even though I disagree with you.

Yes, LLMs are sloppy, and local models usually more so (but things change fast).

But the local ones have one big advantage: they are private. So you can safely feed them the collection of your private documents and things you wouldn't trust people like sama with. The fact that some people do not care is one of the failures of our educational system.

▲

organsnyder a month ago | parent | prev | next [-]

It is possible to get real work done with LLMs. There are plenty of ethical concerns, and they're definitely over-hyped, but they are exceptionally useful tools when used well.

	▲	varispeed a month ago \| parent [-]
		[dead]

▲

a month ago | parent | prev | next [-]

[deleted]

▲

gamander2 a month ago | parent | prev [-]

[dead]

	▲	gcr 25 days ago \| parent [-]
		which models do you have in mind? grok from xai?

▲

embedding-shape a month ago | parent | prev | next [-]

If I could find a RTX Pro 6000 for $5K I'd definitively grab it, I'm running RedHatAI/Qwen3.6-35B-A3B-NVFP4 on one (I had to pay closer to $10K for it though) with 260K context and it's a blast! ds4 by antirez also works well, even IQ2XXS seems to work relatively well but Qwen3.6-35B-A3B-NVFP4 is both faster and higher quality responses (at least for coding and translations which I use them mostly for).

▲

tarruda a month ago | parent | prev | next [-]

> What’s the price point for getting into that sweet spot?

In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.

▲

smcleod a month ago | parent | prev | next [-]

Really right now it's the M5 Max MacBook Pro 128GB, the RTX6000 is a nice card but you'd need more than one of them and you have to have a desktop to suit. The DGX Spark is slow and has pretty limited software support.

▲

ttoinou a month ago | parent | prev | next [-]

M5 Max 64GB (sweet spot) or 128GB (only 1000 USD, better to keep it for the future) more are the best quality price ratio, future proof, reliable, resellable and flexible workloads. Harder to use as a server might be the only drawback

▲

throwaw12 a month ago | parent | next [-]

What do you recommend for non-Mac setup? I am a Mac user, but its getting expensive, and not seeing reason to jump to the latest M5

▲

barbacoa a month ago | parent | next [-]

Try looking into Ryzen AI Max 395. AMD made a CPU/GPU soc with unified memory specifically for ai inference. Can buy mini PCs with up to 128gb ram.

▲

krzyk a month ago | parent | next [-]

Isn't CUDA/nvidia the go to solution for most local models, with the rest being second class citizents?

	▲	gcr a month ago \| parent [-]
		Depends. ROCm is pretty well-supported for example. Non-NVIDIA backends tend to get less support and new features land slower, or features that are expected to improve performance wind up hurting it instead. That sort of thing. For basic “token in/token out” workloads without fine tuning, it’s probably fine ??

▲

simple10 a month ago | parent | prev [-]

The Ryzen AI Max 395 128gb is super cool, but not fast for inference. Order of magnitude slower than dedicated GPU but at half the cost. You can run larger models on it but it's slow. Great for local async work. Not great for daily chat or code agent driver.

▲

throwa356262 a month ago | parent [-]

The latest NPUs are pretty fast, I think what is missing is more optimised software support.

	▲	plagiarist a month ago \| parent [-]
		The vRAM bandwidth is at least as much a problem as compute on these ones, there is a lot of data to shuffle around

▲

varispeed a month ago | parent | prev [-]

Probably a comparable non-Mac setup will be Threadripper, but it will become much more expensive. My view is that actually Apple products are the cheapest on the market when it comes to performance.

▲

roger_ a month ago | parent | prev [-]

M5 Max 128GB for $1k?

	▲	tempoponet a month ago \| parent \| next [-]
		The memory upgrade is $1k on a Macbook Pro. The laptop is ~$5500.
	▲	smallerize a month ago \| parent \| prev [-]
		I think they mean the upgrade to 128GB is +$1k.

▲

tandr 25 days ago | parent | prev | next [-]

Don't mind me asking, but where did you find $5k RTX 6000? Even 48GB model (previous gen) shows minimum at 7k, and 96GB one (Blackwell) is ~10k on Amazon...

▲

CamperBob2 25 days ago | parent [-]

$5K is presumably what it costs to pay some local gangsters to break into an nVidia warehouse. That's the only you will pay $5K for an RTX 6000 for the next couple of years.

The server edition has gone up $2K in the last couple of weeks alone, at the outlet where I bought one previously.

	▲	tandr 24 days ago \| parent [-]
		Man... Does it mean that buying RTX 6000 for 10k today is actually becoming an investment?

▲

pulse-dev a month ago | parent | prev [-]

[dead]