Remix.run Logo
manquer 4 days ago

Scaling inference not training is what OP means I believe .

The smaller startups like cursor or windsurf are not competing on foundational model development. So whether new models are generationally better is not relevant to them.

A cursor is competing with Claude code and both use Claude Sonnet.

Even if Cursor was running a on par model on their own GPUs their inference costs will not as cheap as those of Anthropic just because they would not be operating at the same scale . Larger DCs means better deals, more knowledge about running an inference better because they are also doing much larger training runs.