Remix.run Logo
surround 5 days ago

> The betting markets were not impressed by GPT-5. I am reading this graph as "there is a high expectation that Google will announce Gemini-3 in August", and not as "Gemini 2.5 is better than GPT-5".

This is an incorrect interpretation. The benchmark which the betting market is based upon currently ranks Gemini 2.5 higher than GPT-5.

theahura 5 days ago | parent | next [-]

EDIT: I updated the article to account for this perspective.

------

This can't be right -- they're using LMArena without style control to resolve the market, and GPT-5 is ahead right? (https://lmarena.ai/leaderboard/text/overall-no-style-control)

> This market will resolve according to the company which owns the model which has the highest arena score based off the Chatbot Arena LLM Leaderboard (https://lmarena.ai/) when the table under the "Leaderboard" tab is checked on August 31, 2025, 12:00 PM ET.

> Results from the "Arena Score" section on the Leaderboard tab of https://lmarena.ai/leaderboard/text with the style control off will be used to resolve this market.

> If two models are tied for the top arena score at this market's check time, resolution will be based on whichever company's name, as it is described in this market group, comes first in alphabetical order (e.g. if both were tied, "Google" would resolve to "Yes", and "xAI" would resolve to "No")

> The resolution source for this market is the Chatbot Arena LLM Leaderboard found at https://lmarena.ai/. If this resolution source is unavailable at check time, this market will remain open until the leaderboard comes back online and resolve based on the first check after it becomes available. If it becomes permanently unavailable, this market will resolve based on another resolution source.

surround 5 days ago | parent [-]

You may have already figured this out, but the leaderboard you linked to (https://lmarena.ai/leaderboard/text/overall-no-style-control) shows gemini-2.5-pro ahead with a score of 1471 compared to gpt-5 at 1462.

rrhjm53270 5 days ago | parent | next [-]

It is very interesting that among top-20 models, all non-proprietary ones are from China.

tim333 5 days ago | parent | prev [-]

gpt-5 was ahead on that last night

surround 5 days ago | parent [-]

The leaderboard hasn't changed since it was updated to add gpt-5. Here's what it looked like yesterday https://archive.is/XIrbN

If you saw gpt-5 was ahead, you might have been looking at the leaderboard with style control https://lmarena.ai/leaderboard/text/overall

JimDabell 5 days ago | parent | prev [-]

> This is an incorrect interpretation. The benchmark which the betting market is based upon currently ranks Gemini 2.5 higher than GPT-5.

You can see from the graph that Google shot way up from ~25% to ~80% upon the release of GPT-5. Google’s model didn’t suddenly get way better at any benchmarks, did it?

dcre 5 days ago | parent [-]

It's not about Google's model getting better. It is that gpt-5 already has a worse score than Gemini 2.5 Pro had before gpt-5 came out (on the particular metric that determines this bet: Overall Text without Style Control).

https://lmarena.ai/leaderboard/text/overall-no-style-control

That graph is a probability. The fact that it's not 100% reflects the possibility that gpt-5 or someone else will improve enough by the end of the month to beat Gemini.