Remix.run Logo
simgt 8 hours ago

I still don't understand what the incentive is for releasing genuinely good model weights. What makes sense however is OpenAI releasing a somewhat generic model like gpt-oss that games the benchmarks just for PR. Or some Chinese companies doing the same to cut the ground from under the feet of American big tech. Are we really hopeful we'll still get decent open weights models in the future?

mirekrusin 8 hours ago | parent | next [-]

Because there is no money in making them closed.

Open weight means secondary sales channels like their fine tuning service for enterprises [0].

They can't compete with large proprietary providers but they can erode and potentially collapse them.

Open weights and research builds on itself advancing its participants creating environment that has a shot at proprietary services.

Transparency, control, privacy, cost etc. do matter to people and corporations.

[0] https://mistral.ai/solutions/custom-model-training

NitpickLawyer 8 hours ago | parent | prev | next [-]

> gpt-oss that games the benchmarks just for PR.

gpt-oss is killing the ongoing AIME3 competition on kaggle. They're using a hidden, new set of problems, IMO level, handcrafted to be "AI hardened". And gpt-oss submissions are at ~33/50 right now, two weeks into the competition. The benchmarks (at least for math) were not gamed at all. They are really good at math.

lostmsu 7 hours ago | parent [-]

Are they ahead of all other recent open models? Is there a leaderboard?

NitpickLawyer 6 hours ago | parent [-]

There is a leaderboard [1] but we'll have to wait till april for the competition to end to know what models they're using. The current number 3 on there (34/50) has mentioned in discussions that they're using gpt-oss-120b. There were also some scores shared for gpt-oss-20b, in the 25/50 range.

The next "public" model is qwen30b-thinking at 23/50.

Competition is limited to 1 H100 (80GB) and 5h runtime for 50 problems. So larger open models (deepseek, larger qwens) don't fit.

[1] https://www.kaggle.com/competitions/ai-mathematical-olympiad...

data-ottawa 6 hours ago | parent [-]

I find the qwen3 models spend a ton of thinking tokens which could hamstring them on the runtime limitations. Gpt-oss 120b is much more focused and steerable there.

The token use chart in the OP release page demonstrates the Qwen issue well.

Token churn does help smaller models on math tasks, but for general purpose stuff it seems to hurt.

talliman 8 hours ago | parent | prev | next [-]

Until there is a sustainable, profitable and moat-building business model for generative AI, the competition is not to have the best proprietary model, but rather to raise the most VC money to be well positioned when that business model does arise.

Releasing a near stat-of-the-art open model instanly catapults companies to a valuation of several billion dollars, making it possible raise money to acquire GPUs and train more SOTA models.

Now, what happens if such a business model does not emerge? I hope we won't find out!

mirekrusin 8 hours ago | parent | next [-]

Explained well in this documentary [0].

[0] https://www.youtube.com/watch?v=BzAdXyPYKQo

simgt 7 hours ago | parent [-]

I was fully expecting that but it doesn't get old ;)

memming 8 hours ago | parent | prev [-]

It’s funny how future money drive the world. Fortunately it’s fueling progress this time around.

prodigycorp 8 hours ago | parent | prev | next [-]

gpt-oss are really solid models. by far the best at tool calling, and performant.

nullbio 7 hours ago | parent | prev [-]

Google games benchmarks more than anyone, hence Gemini's strong bench lead. In reality though, it's still garbage for general usage.