Remix.run Logo
skiing_crawling 6 hours ago

"it can fit" on 256GB of RAM, but it will be heavily quantized and still run very slowly. The headline number is not token generation, its prompt processing. So if you get 10 tok/s and an API gives you 20-30 tok/s, it doesn't seem that bad on its face, but a mac studio or any other machine that's not loading all of it into GPU will do PP 20-50X slower than a purely GPU based setup, which is what actually makes this unusable without $50k in GPUs.

On top of that, you will still be heavily quantized.

gerdesj 5 hours ago | parent [-]

A nvidia spark thingie has 128GB unified RAM. They also have a dual port version of one of these things: https://www.nvidia.com/content/dam/en-zz/Solutions/networkin.... ie 2 x 100GB/s ports, they may even be 2 x 200GB/s. Once I've got my paws on one, I'll know more.

You can cluster these beasts too. Two and three (with two IP subnets) is fairly obvious. Four or more might need a switch depending on how much network latency affects things.

Apple seem to have forgotten about M series with gobs of RAM. I can't get the Apple shop to show more than 96GB of unified RAM and that costs a kidney.

mapontosevenths 5 hours ago | parent | next [-]

I have one, and I love it. That said my buddies Mac smokes it for inference workloads in terms of tokens per second AND its more usable for other things.

If you are training and doing research it's great, if you want to cluster them it cant be beat, but if you just want local inference on a single box buy a mac or even a strix halo device.

colinsane 3 hours ago | parent | next [-]

can those macs boot linux? i've heard about Asahi but have no idea how far along they are. i've got my fleet configured with nix and sure, nix can target darwin, but there's a _lot_ of sharp edges there: i don't really want to pull that thread unless i have to...

mapontosevenths 3 hours ago | parent [-]

I don't know. I think he just uses LMStudio most of the time on his, but that's one place I can say the spark really shines for me.

I'm a Linux guy, but also don't always have alot of time. The Spark comes out of the box with a nice Linux distro that's pre-configured to be easy to setup and the guides and online resources make getting up and running trivial, for even some complex tasks. You would have to do a LOT of tinkering just to figure out some of the things the nvidia resources walk you through natively. They have guides for a ton of stuff that include the optimal settings so you don't have to figure it all out through trial and error.

Check out these "playbooks" for some examples. [0] There's a lot to be said for not having to piece all that together yourself.

https://build.nvidia.com/spark

I think between unboxing mine setting it up to run headless, and generating tokens was like 20 minutes total for me.

Fizz43 4 hours ago | parent | prev [-]

which mac is smoking the spark?

pmarreck 4 hours ago | parent [-]

pretty much any of them, dude, as long as you have enough RAM, since it uses unified RAM and a powerful SoC CPU/GPU. Literally any M-class model, but the M5 is currently top tier.

dannyw 2 hours ago | parent | next [-]

The DGX Spark has basically the same memory bandwidth as a M5 Pro, and far more than a M5.

Only the M3 Ultra really beats it, and once you start scoping out the cost of a M3 Ultra with 128GB or 256GB, the DGX Spark doesn’t look bad after all.

fsuts 38 minutes ago | parent | prev | next [-]

How noisy does his fan get…

mapontosevenths 3 hours ago | parent | prev [-]

Yep. Memory bandwidth is what decides how fast LLM's generate tokens (mostly). The DGX Spark has something like 270 GB/s of memory bandwidth, and the m5 ultra is ~615 GB/s. Theoretically DOUBLE the speed. In practice he only generates like 25% more tok/s, but that's still very impressive.

The spark can fine tune models in 1/4 the time and excels at other compute tasks in ways that Mac never can. Plus the high bandwidth ConnectX-7 ports would be like $1700 to buy on a card just for the network adapters... But for generating tokens, it just plain loses.

jauntywundrkind 4 hours ago | parent | prev | next [-]

200 Gb / s (not GB/s)!

(Still potentially very useful! But not magically ultra fast.)

Computer0 5 hours ago | parent | prev [-]

128 gb of much slower ram than Apple.

dannyw 2 hours ago | parent [-]

DGX Spark is ~273GB/s. That’s about M5 Pro territory, and twice as fast as the M5. You’d have to go to the M5 Max, or M3 Ultra, to get higher memory bandwidth than the Spark.