| ▲ | impulser_ 9 hours ago |
| Are they buying them to try and slow down open source models and protect the massive amounts of money they make from OpenAI, Anthropic, Meta ect? It quite obvious that open source models are catching up to closed source models very fast they about 3-4 months behind right now, and yeah they are trained on Nvidia chips, but as the open source models become more usable, and closer to closed source models they will eat into Nvidia profit as these companies aren't spending tens of billion dollars on chips to train and run inference. These are smaller models trained on fewer GPUs and they are performing as good as the pervious OpenAI and Anthropic models. So obviously open source models are a direct threat to Nvidia, and they only thing open source models struggle at is scaling inference and this is where Groq and Cerberus come into the picture as they provide the fastest inference for open source models that make them even more usable than SOTA models. Maybe I'm way off on this. |
|
| ▲ | Workaccount2 9 hours ago | parent | next [-] |
| Shy of an algo breakthrough, open source isn't going to catch up with SOTA, their main trick for model improvement is distilling the SOTA models. That's why they they have perpetually been "right behind". |
| |
| ▲ | impulser_ 9 hours ago | parent | next [-] | | They don't need to catch up. They just need to be good enough and fast as fuck. Vast majority of useful tasks of LLMs has nothing to do with how smart they are. GPT-5 models have been the most useless models out of any model released this year despite being SOTA, and it because it slow as fuck. | | |
| ▲ | aschobel 9 hours ago | parent | next [-] | | For coding I don’t use any of the previous gen models anymore. Ideally I would have both fast and SOTA; if I would have to pick one I’d go with SOTA. There a report by OpenRouter on what folks tend to pay for it; it generally is SOTA in the coding domain. Folks are still paying a premium for them today. There is a question if there is a bar where coding models are “good enough”; for myself I always want smarter / SOTA. | | |
| ▲ | wyre 7 hours ago | parent [-] | | FWIW coding is one of the largest usages for LLM's where SOTA quality matters. I think the bar for when coding models are "good enough" will be a tradeoff between performance and price. I could be using Cerebras Code and saving $50 a month, but Opus 4.5 is fast enough and I value the piece-of-mind I have knowing it's quality is higher than Cerebras' open source models to spend the extra money. It might take a while for this gap to close, and what is considered "good enough" will be different for every developer, but certainly this gap cannot exist forever. |
| |
| ▲ | gejose 5 hours ago | parent | prev | next [-] | | > just need to be good enough and fast as fuck Hard disagree. There are very few scenarios where I'd pick speed (quantity) over intelligence (quality) for anything remotely to do with building systems. | | |
| ▲ | ssivark 2 hours ago | parent | next [-] | | If you thought a human working on something will benefit from being "agile" (building fast, shipping quickly, iterating, getting feedback, improving), why should it be any different from AI models? Implicit in your claim are specific assumptions about how expensive/untenable it is to build systemic guardrails and human feedback, and specific cost/benefit ratio of approximate goal attainment instead of perfect goal attainment. Rest assured that there is a whole portfolio of situations where different design points make most sense. | | |
| ▲ | nkmnz 42 minutes ago | parent [-] | | > why should it be any different from AI models? 1. law of diminishing returns - AI is already much, much faster at many tasks than humans, especially at spitting out text, so becoming even faster doesn’t always make that much of a difference.
2. theory of constraints - throughput of a system is mostly limited by the „weakest link“ or slowest part, which might not be the LLM, but some human-in-the-loop, which might be reduced only by smarter AI, not by faster AI.
3. Intelligence is an emergent property of a system, not a property of its parts - with other words: intelligent behaviour is created through interactions. More powerful LLMs enable new levels of interaction that are just not available with less capable models. You don’t want to bring a knife, not even the quickest one in town, to a massive war of nukes. |
| |
| ▲ | jameshush 5 hours ago | parent | prev | next [-] | | I agree with you for many use cases, but for the use case I'm focused on (Voice AI) speed is absolutely everything. Every millisecond counts for voice, and most voice use cases don't require anything close to "deep thinking. E.g., for inbound customer support use cases, we really just want the voice agent to be fast and follow the SOP. | | |
| ▲ | nkmnz 38 minutes ago | parent [-] | | If you have a SOP, most of the decision logic can be encoded and strictly enforced. There is zero intelligence involved in this process, it’s just if/else. The key part is understanding the customer request and mapping it to the cases encoded in the SOP - and for that part, intelligence is absolutely required or your customers will not feel „supported“ at all, but be better off with a simple form. |
| |
| ▲ | gessha 3 hours ago | parent | prev [-] | | As long as the faster tech is reliable and I understand its quirks, I can work with it. |
| |
| ▲ | Aurornis 6 hours ago | parent | prev | next [-] | | > They don't need to catch up. They just need to be good enough The current SOTA models are impressive but still far from what I’d consider good enough to not be a constant exercise in frustration. When the SOTA models still have a long way to go, the open weights models have an even further gap distance to catch up. | |
| ▲ | nl 9 hours ago | parent | prev | next [-] | | GPT 5 Codex is great - the best coding model around except maybe for Opus. I'd like more speed but prefer more quality than more speed. | |
| ▲ | Demiurge 7 hours ago | parent | prev | next [-] | | I get GPT 5.2 responses on copilot faster than for any other model, almost instantly. Are you sure they’re slow as fuck? | |
| ▲ | dontwannahearit 9 hours ago | parent | prev | next [-] | | Confused. Is ‘fuck’ fast or slow? Or both at the same time? Is there a sort of quantum superposition of fuck? | | | |
| ▲ | echelon 6 hours ago | parent | prev | next [-] | | This. You can distill a foundation model into open source. The Chinese will be doing this for us for a long time. We should be glad that the foundation model companies are stuck running on treadmills. Runaway success would be bad for everyone else in the market. Let them sweat. | |
| ▲ | nineteen999 9 hours ago | parent | prev [-] | | Bullseye. |
| |
| ▲ | _fizz_buzz_ 7 hours ago | parent | prev | next [-] | | > their main trick for model improvement is distilling the SOTA models Could you elaborate? How is this done and what does this mean? | | |
| ▲ | MobiusHorizons 7 hours ago | parent [-] | | I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though. | | |
| ▲ | tensor 3 hours ago | parent | next [-] | | No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvements that allowed them to train a model of that size with vastly less compute than anyone expected, even using distillation. I also haven’t seen any hard data on how much they do use distillation like techniques. They for sure used a bunch of synthetic generated data to get better at reasoning, something that is now commonplace. | | | |
| ▲ | tickerticker 3 hours ago | parent | prev [-] | | Yes. They bounced millions of queries off of ChatGPT to teach/form/train their DeepSeek model. This bot-like querying was the "distillation." | | |
| ▲ | orbital-decay 25 minutes ago | parent | next [-] | | They definitely didn't. They demonstrated their stuff long before OAI and the models were nothing like each other. | |
| ▲ | SirMaster 2 hours ago | parent | prev [-] | | Why would OpenAI allow someone to do that? | | |
| ▲ | MadnessASAP an hour ago | parent [-] | | They didn't, but how do you stop it? Presuming the scale that OpenAI is running at? |
|
|
|
| |
| ▲ | mistercheph 6 hours ago | parent | prev | next [-] | | Too bad, so sad for the Mister Krabs secret recipe-pilled labs. Shy of something fundamental changing, it will always be possible to make a distillation that is 98% as good as a frontier model for ~1% of the cost of training the SOTA model. Some technology just wants to be free :) | |
| ▲ | stx5 6 hours ago | parent | prev [-] | | [dead] |
|
|
| ▲ | vachina 34 minutes ago | parent | prev | next [-] |
| More like they’re trying to snuff out potential competitors. Why work as hard to push your own products if NVIDIA gave you money to retire for the rest of your life? |
|
| ▲ | nl 9 hours ago | parent | prev | next [-] |
| NVIDIA release some of the best open source models around. Almost all open source models are trained and mostly run on NVIDIA hardware. Open source is great for NVIDIA. They want more open source, not less. Commoditize your complement is business 101. |
| |
| ▲ | impulser_ 9 hours ago | parent [-] | | Then why are they spending $20 billion dollars to handicap an inference company that giving open source models a major advantage over closed source models? | | |
| ▲ | gpapilion 4 hours ago | parent | next [-] | | Realistically groq is a great solution but has near impossible requirements for deployment. Just look at how many adapters you need to meet the memory requirements of a small llm. SRAM is fast but small. I would guess their interconnect technology is what NVIDIA wants. You need something like 75 adapters for an 8b parameter model they had some really interesting tech to make the accelerator to accelerator communication work and scale. They were able to do that well before nvl 72 and they scale to hundreds of adapters since large models require more adapters still. We will know in a few months. | |
| ▲ | nl 5 hours ago | parent | prev | next [-] | | > handicap Your words. Because it's very good tech for inference? It doesn't even do training. And most inference providers for Open Source models use NVIDIA eg Fireworks, Basten, TogetherAI etc. Most NVIDIA sales go to training clusters. That is changing but it'd be an interesting strategy to differentiate the training and inference lines. | |
| ▲ | credit_guy 8 hours ago | parent | prev | next [-] | | > to handicap an inference company That's a non-charitable interpretation of what happened. The are not "spending $20 billion to handicap Groq". They are handing Groq $20 billion to do whatever they want with it. Groq can take this money and build more chips, do more R&D, hire more people. $20 billion is truly a lot of money. It's quite hard to "handicap" someone by giving them $20 billion. | | |
| ▲ | wmf 8 hours ago | parent [-] | | Groq doesn't have any employees. They can't do R&D because there's no one to do it. The $20B goes to Groq's investors. | | |
| ▲ | credit_guy 5 hours ago | parent [-] | | From the article: > Groq added that it will continue as an “independent company,” led by finance chief Simon Edwards as CEO.
The $20B does not go to Groq's investors. It goes to Groq. You can say that Groq is owned by its investors, and this is the same thing, but it's not. In order for the money to go to the investors, Groq needs to disburse a dividend, or to buy back shares. There is no indication that this will happen. And what's more, the investors don't even need this to happen. I'm sure any investor that wants to sell their shares in Groq will now find plenty of buyers at a very advantageous price. | | |
| ▲ | wmf 5 hours ago | parent [-] | | Let's bet on this shit. Where's the Polymarket. |
|
|
| |
| ▲ | p1esk 7 hours ago | parent | prev [-] | | they spending $20 billion dollars to handicap an inference company Inference hardware company |
|
|
|
| ▲ | Kiboneu 3 hours ago | parent | prev | next [-] |
| >Are they buying them to try and slow down open source models The opposite, I think. Why do you think that local models are a direct threat to Nvidia? Why would Nvidia let a few of their large customers have more leverage by not diversifying to consumers? Openai decided to eat into Nvidia's manufacturing supply by buying DRAM; that's concretely threatening behavior from one of Nvidia's larger customers. If Groq sells technology that allows for local models to be used better, why would that /not/ be a profit source for Nvidia to incorporate? Nvidia owes a lot of their success on the consumer market. This is a pattern in the history of computer tech development. Intel forgot this. AMD knows this. See where everyone is now. Besides, there are going to be more Groqs in the future. Is it worth spending ~20B for each of them to continue to choke-hold the consumer market? Nvidia can afford to look further. It'd be a lot harder to assume good faith if Openai ended up buying Groq. Maybe Nvidia knows this. |
| |
| ▲ | deaux 2 hours ago | parent [-] | | > Besides, there are going to be more Groqs in the future. And likely some of them are going to be in countries that won't let them sell out to Nvidia. |
|
|
| ▲ | ilaksh 8 hours ago | parent | prev | next [-] |
| Yes, you are way off, because Groq doesn't make open source models. Groq makes innovative AI accelerator chips that are significantly faster than Nvidia's. |
| |
| ▲ | zamalek 8 hours ago | parent | next [-] | | > Groq makes innovative AI accelerator chips that are significantly faster than Nvidia's. Yeah I'm disappointed by this, this is clearly to move them out of the market. Still, that leaves a vacuum for someone else to fill. I was extremely impressed by Groq last I messed about with it, the inference speed was bonkers. | | | |
| ▲ | LoganDark 8 hours ago | parent | prev [-] | | For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference). |
|
|
| ▲ | heavyset_go 8 hours ago | parent | prev | next [-] |
| Nvidia just released their Nemotron models, and in my testing, they are the best performing models on low-end consumer hardware in both terms of speed and accuracy. |
|
| ▲ | ymck 9 hours ago | parent | prev | next [-] |
| I'd say that it's probably not a play against open source, but more trying to remove/change the bottlenecks in the current chip production cycle. Nvidia likely doesn't care who wins, they just want to sell their chips. They literally can't make enough to meet current demand. If they split off the inference business (and now own one of the only purchasable alternatives) they can spin up more production. That said, it's completely anti-competitive. Nvidia could design a inference chip themselves, but instead the are locking down one of the only real independents. But... Nobody was saying Groq was making any real money. This might just be a rescue mission. |
|
| ▲ | SkyPuncher 9 hours ago | parent | prev | next [-] |
| They need to vertically integrate the entire stack or they die. All of the big players are already making plans for their own chips/hardware. They see everyone else competing for the exact same vendor’s chips and need to diversify. |
|
| ▲ | ramoz 7 hours ago | parent | prev | next [-] |
| They acquired in order to have an ASICs competitor to Google TPU. |
|
| ▲ | matthewfcarlson 9 hours ago | parent | prev | next [-] |
| Idk- cheaper inference seems to be a huge industry secret and providing the best inference tech that only works with nvidia seems like a good plan. Makes nvidia the absolute king of compute against AWS/AMD/Intel seems like a no brainer. |
|
| ▲ | __mharrison__ 9 hours ago | parent | prev [-] |
| How does this work considering the Nemotron models? |