| ▲ | dahart an hour ago | |
I have so many questions… Since Apple already sells unified memory systems, what is the market opportunity you envision? Do you see Nvidia and Apple as competitors, and how? (And I’m not suggesting they’re not, necessarily, but I want to hear where you’re coming from, and they do have very different markets.) Hasn’t Apple used storage size (RAM & disk) for market segmentation for decades? And how does a machine with 128GB unified mem not potentially cut into some people’s reasons for wanting a 96GB GPU? | ||
| ▲ | JohnBooty an hour ago | parent | next [-] | |
I'm not the person you're replying to, but I wholeheartedly agree with them... Quick background: doing AI inference requires three things. Lots of memory, lots of memory bandwidth, and of course plenty of compute that has access to that memory. Quick reference: nVidia 5090 has 1,792 GB/sec bandwidth. 3090 gets about 1000 GB/sec. DGX Spark and AMD 395 whatever get about 275 GB/sec. Apple M1 Max gets 400GB/sec, M5 Max gets 614GB/sec. Ultra variants get 2x that bandwidth, base variants get 1/2 that bandwidth. However... their compute is rather weak. Right now, Apple's offerings are juuuuuust fast enough to run dense 27B models at usable speeds at like, 10% of the performance/watt of nVidia. They're world-leading general purpose CPUs but not killer GPUs. By all accounts, these Windows PCs nVidia is touting seem to have DGX Spark like performance, which is less than impressive. Same with the upcoming AMD AI-oriented consumer stuff. The other context here is that running your own AI at home is just starting to become feasible in terms of open model availability and the ability to run it at usable speeds. Many are interested in it for reasons of privacy, security, and cost certainty vs. buying tokens.
nVidia and AMD can't make their consumer offerings too good at AI, because that risks interfering with their higher-margin data center sales.(And, let's face it. Even if nVidia did release a 6090 with 64-128GB of memory for an affordable price, consumers wouldn't get their hands on them anyway because people would just start filling data centers with them) So. Now you see Apple's opportunity, right? No data center sales to interfere with. No relationship with nVidia or AMD to worry about. They could choose to make an absolute beast of a home AI machine. The M5 Ultra, if announced, might be that. It's admittedly a niche market, but people are already buying 64GB+ Macs faster than Apple can make them and they're fetching high prices on the used market as well. The only real questions are if this market is even something Apple would find time to care about, and if they could secure enough DRAM to make a go at it. They are enormous obviously but they're feeling the RAM pinch just like everybody. | ||
| ▲ | jmyeet an hour ago | parent | prev [-] | |
Apple offers relatively affordable options for a high-memory workstation that uses unified memory. They previously offered 256/512GB Mac Studios (both discontinued). Because of this they can keep larger models in memory. BUT you just can't compete with NVidia performance for LLM workloads (mostly inference) for two reasons: 1. The memory bandwidth just can't compete with a 5090 (1800GB/s). The best current Mac is ~900GB/s. That directly caps tokens/sec and might be manageable but there's another problem; and 2. The raw FLOPS just can't compete with even a 5090. It probably needs to natively support FP4/FP8 to at least maintain a number format parity with NVidia. But beside that, NVidia just has more raw FLOPS. According to Google, an M5 Max does ~70 FP16 TFLOPS while a 5090 does 380. If Apple can close that gap to at least be competitive and also hold larger models in shared VRAM, that would be a competitive advantage and it would directly attack NVidia's market segmentation. The Mac Studio last came out March last year. So we may get an update in Q3. Many are pinning their hopes on this. But it might not happen until next year. When it was released the M4 was the state of the art and it came with either the M4 Max or M3 Ultra (which, as I understand it, is basically 2 M3s stuck together, kind of). What people are hoping for is an M5 Ultra with >1000GB/s of memory bandwidth, ideally 200+ FP16 TFLOPS and hopefully FP4/FP4 support. You can chain Mac Studios together into a cluster with TB5 too. But it's reasonably likely that the next Mac Studio will be only incrementally better than the last generation. | ||