| ▲ | pjs_ 5 hours ago |
| Continue to believe that Cerebras is one of the most underrated companies of our time. It's a dinner-plate sized chip. It actually works. It's actually much faster than anything else for real workloads. Amazing |
|
| ▲ | onlyrealcuzzo 5 hours ago | parent | next [-] |
| Nvidia seems cooked. Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%). Cerebras will be substantially better for agentic workflows in terms of speed. And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia. And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway. Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output. Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem. These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India). What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away. |
| |
| ▲ | sailingparrot 4 hours ago | parent | next [-] | | > let every part of the market slip away. Which part of the market has slept away, exactly ?
Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve.
And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power. | | |
| ▲ | onlyrealcuzzo 4 hours ago | parent [-] | | > Nvidia has a chokehold on the entire market. You're obviously not looking at expected forward orders for 2026 and 2027. | | |
| ▲ | louiereederson 2 hours ago | parent [-] | | I think most estimates have Nvidia at more or less stable share of CoWoS capacity (around 60%), which is ~doubling in '26. |
|
| |
| ▲ | mnicky 4 hours ago | parent | prev | next [-] | | > What am I missing? Largest production capacity maybe? Also, market demand will be so high that every player's chips will be sold out. | | | |
| ▲ | icelancer 42 minutes ago | parent | prev | next [-] | | > What am I missing? VRAM capacity given the Cerebras/Groq architecture compared to Nvidia. In parallel, RAM contracts that Nvidia has negotiated well into the future that other manufacturers have been unable to secure. | |
| ▲ | wing-_-nuts 4 hours ago | parent | prev | next [-] | | Man I hope someone drinks Nvidia's milk shake. They need to get humbled back to the point where they're desperate to sell gpus to consumers again. Only major road block is cuda... | |
| ▲ | whism 5 hours ago | parent | prev | next [-] | | I believe they licensed smth from groq | |
| ▲ | Handy-Man 4 hours ago | parent | prev [-] | | Well they `acquired` groq for a reason. |
|
|
| ▲ | zozbot234 5 hours ago | parent | prev | next [-] |
| It's "dinner-plate sized" because it's just a full silicon wafer. It's nice to see that wafer-scale integration is now being used for real work but it's been researched for decades. |
|
| ▲ | arcanemachiner 5 hours ago | parent | prev | next [-] |
| Just wish they weren't so insanely expensive... |
| |
| ▲ | azinman2 5 hours ago | parent [-] | | The bigger the chip, the worse the yield. | | |
| ▲ | speedgoose 4 hours ago | parent | next [-] | | I suggest to read their website, they explain pretty well how they manage good yield. Though I’m not an expert in this field. I does make sense and I would be surprised if they were caught lying. | |
| ▲ | moralestapia 5 hours ago | parent | prev [-] | | This comment doesn't make sense. | | |
| ▲ | Sohcahtoa82 4 hours ago | parent | next [-] | | One wafer will turn into multiple chips. Defects are best measured on a per-wafer basis, not per-chip. So if if your chips are huge and you can only put 4 chips on a wafer, 1 defect can cut your yield by 25%. If they're smaller and you fit 100 chips on a wafer, then 1 defect on the wafer is only cutting yield by 1%. Of course, there's more to this when you start reading about "binning", fusing off cores, etc. There's plenty of information out there about how CPU manufacturing works, why defects happen, and how they're handled. Suffice to say, the comment makes perfect sense. | | |
| ▲ | snovv_crash 4 hours ago | parent [-] | | That's why you typically fuse off defective sub-units and just have a slightly slower chip. GPU and CPU manufacturers have done this for at least 15 years now, that I'm aware of. |
| |
| ▲ | azinman2 5 hours ago | parent | prev | next [-] | | Sure it does. If it’s many small dies on a wafer, then imperfections don’t ruin the entire batch; you just bin those components. If the entire wafer is a single die, you have much less tolerance for errors. | | | |
| ▲ | louiereederson 2 hours ago | parent | prev | next [-] | | You say this with such confidence and then ask if smaller chips require smaller wafers. | |
| ▲ | DocJade 5 hours ago | parent | prev [-] | | Bigger chip = more surface area = higher chance for somewhere in the chip to have a manufacturing defect Yields on silicon are great, but not perfect | | |
| ▲ | moralestapia 4 hours ago | parent [-] | | Does that mean smaller chips are made from smaller wafers? | | |
| ▲ | Sohcahtoa82 2 hours ago | parent [-] | | Nope. They use the same size wafers and then just put more chips on a wafer. | | |
| ▲ | moralestapia 2 hours ago | parent [-] | | So, does a wafer with a huge chip has more defects per area than a wafer with 100s of small chips? | | |
| ▲ | dgfl an hour ago | parent [-] | | There’s an expected amount of defects per wafer. If a chip has a defect, then it is lost (simplification). A wafer with 100 chips may lose 10 to defects, giving a yield of 90%. The same wafer but with 1000 smaller chips would still have lost only 10 of them, giving 99% yield. |
|
|
|
|
|
|
|
|
| ▲ | dalemhurley 4 hours ago | parent | prev | next [-] |
| Yet investors keep backing NVIDIA. |
| |
| ▲ | vimda 3 hours ago | parent [-] | | At this point Tech investment and analysis is so divorced from any kind of reality that it's more akin to lemmings on the cliff than careful analysis of fundamentals |
|
|
| ▲ | latchkey 5 hours ago | parent | prev | next [-] |
| Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs. Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking. |
| |
| ▲ | boredatoms an hour ago | parent | next [-] | | Power/cooling is the premium. Can always build a bigger hall | | |
| ▲ | latchkey 41 minutes ago | parent [-] | | Exactly my point. Their architecture requires someone to invest the capex / opex to also build another hall. |
| |
| ▲ | p1esk 5 hours ago | parent | prev | next [-] | | The real question is what’s their perf/dollar vs nvidia? | | |
| ▲ | zozbot234 5 hours ago | parent | next [-] | | I guess it depends what you mean by "perf". If you optimize everything for the absolutely lowest latency given your power budget, your throughput is going to suck - and vice versa. Throughput is ultimately what matters when everything about AI is so clearly power-constrained, latency is a distraction. So TPU-like custom chips are likely the better choice. | | |
| ▲ | p1esk 5 hours ago | parent | next [-] | | By perf I mean how much does it cost to serve 1T model to 1M users at 50 tokens/sec. | | |
| ▲ | zozbot234 4 hours ago | parent [-] | | All 1T models are not equal. E.g. how many active parameters? what's the native quantization? how long is the max context? Also, it's quite likely that some smaller models in common use are even sub-1T. If your model is light enough, the lower throughput doesn't necessarily hurt you all that much and you can enjoy the lightning-fast speed. | | |
| ▲ | p1esk 4 hours ago | parent | next [-] | | Just pick some reasonable values. Also, keep in mind that this hardware must still be useful 3 years from now. What’s going to happen to cerebras in 3 years? What about nvidia? Which one is a safer bet? On the other hand, competition is good - nvidia can’t have the whole pie forever. | | |
| ▲ | zozbot234 4 hours ago | parent [-] | | > Just pick some reasonable values. And that's the point - what's "reasonable" depends on the hardware and is far from fixed. Some users here are saying that this model is "blazing fast" but a bit weaker than expected, and one might've guessed as much. > On the other hand, competition is good - nvidia can’t have the whole pie forever. Sure, but arguably the closest thing to competition for nVidia is TPUs and future custom ASICs that will likely save a lot on energy used per model inference, while not focusing all that much on being super fast. | | |
| |
| ▲ | wiredpancake 3 hours ago | parent | prev [-] | | [dead] |
|
| |
| ▲ | fragmede 4 hours ago | parent | prev [-] | | > Throughput is ultimately what matters I disagree. Yes it does matter, but because the popular interface is via chat, streaming the results of inference feels better to the squishy messy gross human operating the chat, even if it ends up taking longer. You can give all the benchmark results you want, humans aren't robots. They aren't data driven, they have feelings, and they're going to go with what feels better. That isn't true for all uses, but time to first byte is ridiculously important for human-computer interaction. | | |
| ▲ | zozbot234 4 hours ago | parent [-] | | You just have to change the "popular interface" to something else. Chat is OK for trivia or genuinely time-sensitive questions, everything else goes through via email or some sort of webmail-like interface where requests are submitted and replies come back asynchronously. (This is already how batch APIs work, but they only offer a 50% discount compared to interactive, which is not enough to really make a good case for them - especially not for agentic workloads.) |
|
| |
| ▲ | xnx 5 hours ago | parent | prev | next [-] | | Or Google TPUs. | | | |
| ▲ | latchkey 5 hours ago | parent | prev [-] | | Exactly. They won't ever tell you. It is never published. Let's not forget that the CEO is an SEC felon who got caught trying to pull a fast one. |
| |
| ▲ | spwa4 5 hours ago | parent | prev [-] | | Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is. | | |
| ▲ | latchkey 5 hours ago | parent [-] | | The dirty secret is that there is plenty of power. But, it isn't all in one place and it is often stranded in DC's that can't do the density needed for AI compute. Training models needs everything in one DC, inference doesn't. |
|
|
|
| ▲ | femiagbabiaka 5 hours ago | parent | prev | next [-] |
| yep |
|
| ▲ | xnx 5 hours ago | parent | prev [-] |
| Cerebras is a bit of a stunt like "datacenters in spaaaaace". Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM. |
| |
| ▲ | the_duke 5 hours ago | parent | next [-] | | They claim the opposite, though, saying the chip is designed to tolerate many defects and work around them. | |
| ▲ | 5 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | 5 hours ago | parent | prev [-] | | [deleted] |
|