▲ | oceanplexian 5 days ago | |
I feel like it's totally the opposite. The differentiator is the fact that the scaling myth was a lie. The GPT-5 flop should make that obvious enough. These guys are spending billions and can't make the models show more than a few % improvement. You need to actually innovate, e.g. tricks like MoE, tool calling, better cache utilization, concurrency, better prompting, CoT, data labeling, and so on. Not two weeks ago some Chinese academics put out a paper called Deep Think With Confidence where they coaxed GPT-OSS-120B into thinking a little longer causing it to perform better on benchmarks than it did when OpenAI released it. | ||
▲ | manquer 4 days ago | parent [-] | |
Scaling inference not training is what OP means I believe . The smaller startups like cursor or windsurf are not competing on foundational model development. So whether new models are generationally better is not relevant to them. A cursor is competing with Claude code and both use Claude Sonnet. Even if Cursor was running a on par model on their own GPUs their inference costs will not as cheap as those of Anthropic just because they would not be operating at the same scale . Larger DCs means better deals, more knowledge about running an inference better because they are also doing much larger training runs. |