Remix.run Logo
Obertr 5 days ago

At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed

Image model they have released is much worse than nano banana pro, ghibli moment did not happen

Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding

The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.

avazhi 5 days ago | parent | next [-]

"At this point in time I start to believe OAI is very much behind on the models race and it can't be reversed"

This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.

drawnwren 5 days ago | parent | next [-]

OAI also got talent mined. Their top intellectual leaders left after fight with sama, then Meta took a bunch of their mid-senior talent, and Google had the opposite. They brought Noam and Sergey back.

mmaunder 5 days ago | parent | prev [-]

Yeah the only thing standing in Google's way is Google. And it's the easy stuff, like sensible billing models, easy to use docs and consoles that make sense and don't require 20 hours to learn/navigate, and then just the slew of bugs in Gemini CLI that are basic usability and model API interaction things. The only differentiator that OpenAI still has is polish.

Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.

Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.

Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.

Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.

mips_avatar 5 days ago | parent | next [-]

I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.

vanviegen 4 days ago | parent | next [-]

Openrouter is great! Prepaid, no surprise bills. Easily switch between any models you desire. Dead simple interface. Reliable. What's not to like?

metadat 4 days ago | parent [-]

With OpenRouter it can be unclear if you're getting a quantized model or not.

mmaunder 5 days ago | parent | prev | next [-]

Agreed. It's ridiculous.

visarga 5 days ago | parent | prev [-]

I do. Gave up using Gemini directly.

mips_avatar 5 days ago | parent [-]

I mean I do too, had a really odd Gemini bug until I did byok on openrouter

ewoodrich 4 days ago | parent | prev [-]

Gemini CLI via a Google One plan is the regular consumer billing flow which is pretty straightforward.

GenerWork 5 days ago | parent | prev | next [-]

I'm actually liking 5.2 in Codex. It's able to take my instructions, do a good job at planning out the implementation, and will ask me relevant questions around interactions and functionality. It also gives me more tokens than Claude for the same price. Now, I'm trying to white label something that I made in Figma so my use case is a lot different from the average person on this site, but so far it's my go to and I don't see any reason at this time to switch.

gpt5 5 days ago | parent [-]

I've noticed when it comes to evaluating AI models, most people simply don't ask difficult enough questions. So everything is good enough, and the preference comes down to speed and style.

It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.

int_19h 5 days ago | parent | next [-]

I'm currently working on a Lojban parser written in Haskell. This is a fairly complex task that requires a lot of reasoning. And I tried out all the SOTA agents extensively to see which one works the best. And Opus 4.5 is running circles around GPT-5.2 for this. So no, I don't think it's true that OpenAI "still has the lead" in general. Just in some specific tasks.

GenerWork 5 days ago | parent | prev | next [-]

I'd argue that 5.2 just barely squeaks past Sonnet 4.5 at this point. Before this was released, 4.5 absolutely beat Codex 5.1 Medium and could pretty much oneshot UI items as long as I didn't try to create too many new things at once.

fellowniusmonk 5 days ago | parent | prev [-]

I have a very complex set of logic puzzles I run through my own tests.

My logic test and trying to get an agent to develop a certain type of ** implementation (that is published and thus the model is trained on to some limited extent) really stress test models, 5.2 is a complete failure of overfitting.

Really really bad in an unrecoverable infinite loop way.

It helps when you have existing working code that you know a model can't be trained on.

It doesn't actually evaluate the working code it just assumes it's wrong and starts trying to re-write it as a different type of **.

Even linking it to the explanation and the git repo of the reference implementation it still persists in trying to force a different **.

This is the worst model since pre o3. Just terrible.

int32_64 5 days ago | parent | prev | next [-]

Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.

crazygringo 5 days ago | parent | next [-]

For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.

But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.

And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)

smashed 5 days ago | parent | next [-]

I haven't seen any LLM tech shine "where every detail matters".

In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.

You might have found models, prompts or workflows that work for you though, I'm interested.

bitpush 5 days ago | parent | prev | next [-]

> OpenAI's brand recognition shines.

We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.

Now very few people use Snapchat, and it has been reduced to a footnote in history.

If you think I'm exaggerating, that just proves my point.

decimalenough 5 days ago | parent [-]

Not a great example: Snapchat made it through the slump, successfully captured the next generation of teenagers, and now has around 500M DAUs.

bitpush 5 days ago | parent [-]

You might not remember, but Snapchat was once supposed to take on Facebook. The founder was so cocky that they declined being bought by Facebook because they thought they could be bigger.

I never said Snapchat is dead. It still lives on, but it is a shell of the past. They had no moat, and the competitors caught up (Instagram, Whatsapp and even LinkedIn copied Snapchat with stories .. and rest is history)

5 days ago | parent | prev [-]
[deleted]
xbmcuser 5 days ago | parent | prev | next [-]

Google biggest advantage over time will be costs. They have their own hardware which they can and will optimise for their LLMS. And Google has experience of getting market share over time by giving better results, performance or space. ie gmail vs hotmail/yahoo. Chrome vs IE/Firefox. So don't discount them if the quality is better they will get ahead over time.

int_19h 5 days ago | parent [-]

It already is costs. Their Pro plan has much more generous limits compared to both OpenAI and especially Anthropic. You get 20 Deep Research queries with Pro per day, for example.

rfw300 5 days ago | parent | prev | next [-]

That might be true for a narrow definition of chatbots, but they aren't going to survive on name recognition if their models are inferior in the medium term. Right now, "agents" are only really useful for coding, but when they start to be adopted for more mainstream tasks, people will migrate to the tools that actually work first.

holler 5 days ago | parent | prev | next [-]

this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.

macNchz 5 days ago | parent | next [-]

Google has great distribution to be able to just put Gemini in front of people who are already using their many other popular services. ChatGPT definitely came out of the gate with a big lead on name recognition, but I have been surprised to hear various non-techy friends talking about using Gemini recently, I think for many of them just because they have access at work through their Workspace accounts.

Obertr 5 days ago | parent | prev | next [-]

Most of Europe if full of Gemini ads, my parents use Gemini because it is free and it popped up in YouTube ad before the video

Just go outside the bubble plus take a bit older people

ewoodrich 4 days ago | parent [-]

Yeah my parents never really cared enough to explore ChatGPT despite hearing about it 10 times a day in news/media for the last few years. But recently my mom started using Google's AI Search mode after first trying it while doing research for house hunting and my dad uses the Gemini app for occasional questions/identifying parts and stuff (he has always loved Google Lens so those sort of interactive multimedia features are the main pull vs plain text chatbot conversations).

They are both Android/Google Search users so all it really took was "sure I guess I'll try that" in response to a nudge from Google. For me personally I have subscriptions to Claude/ChatGPT/Gemini for coding but use Gemini for 90% of chatbot questions. Eventually I'll cancel some of them but will probably keep Gemini regardless because I like having the extra storage with my Google One plan bundle. Google having a pre-existing platform/ecosystem is a huge advantage imo.

nimchimpsky 5 days ago | parent | prev [-]

[dead]

fullstick 5 days ago | parent | prev | next [-]

I doubt anyone I know who is using llms outside of work knows that there are benchmark tests for these models.

jay_kyburz 5 days ago | parent | prev [-]

This is why both google and microsoft are pushing Gemini and Copilot in everyone's face.

dieortin 5 days ago | parent | prev | next [-]

Is there anything pointing to Brin having anything to do with Google’s turnaround in AI? I hear a lot of people saying this, but no one explaining why they do

novok 5 days ago | parent | next [-]

In organizations, everyone's existence and position is politically supported by their internal peers around their level. Even google's & microsoft's current CEOs are supported by their group of co-executives and other key players. The fact that both have agreeable personalities is not a mistake! They both need to keep that balance to stay in power, and that means not destroying or disrupting your peer's current positions. Everything is effectively decided by informal committee.

Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.

This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".

JumpCrisscross 5 days ago | parent [-]

> Founders are special, because they are not beholden to this social support network to stay in power

Of course they are. Founders get fired all the time. As often as non-founder CEOs purge competition from their peers.

> The only others they are beholden too are their co-founders, and in some cases major investor groups

This describes very few successful executives. You can have your co-founders and investors on board, if your talent and customers hate you, they’ll fuck off.

ryoshu 5 days ago | parent | prev | next [-]

If he's having an impact it's because he can break through the bureaucracy. He's not trying to protect a fiefdom.

HarHarVeryFunny 5 days ago | parent | prev [-]

I would say it more goes back to the Google Brain + DeepMind merger, creating Google DeepMind headed by Demis Hassabis.

The merger happened in April 2023.

Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.

raincole 5 days ago | parent | prev | next [-]

That's a quite sensationalized view.

Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?

Obertr 5 days ago | parent | next [-]

Check the size and budget of Google iniatives. It’s unlimited

akie 4 days ago | parent [-]

Google basically has unlimited budget and unlimited data. If they're ahead now, which I believe they are, they'll be very very difficult to catch.

BoredPositron 5 days ago | parent | prev [-]

The Ghibli moment was an influencer fad not real advancement.

JumpCrisscross 5 days ago | parent | prev | next [-]

> I start to believe OAI is very much behind

Kara Swisher recently compared OpenAI to Netscape.

Andrex 4 days ago | parent [-]

Ouch.

Maybe we'll get some awesome FOSS tech out of its ashes?

JumpCrisscross 4 days ago | parent [-]

We’ll get a bail-out and then a massive data-centre and energy-production build-out.

baq 5 days ago | parent | prev | next [-]

GPT 5.2 is actually getting me better outputs than Opus 4.5 on very complex reviews (on high, I never use less) - but the speed makes Opus the default for 95% of use cases.

yieldcrv 5 days ago | parent | prev | next [-]

the trend I've seen is that none of these companies are behind in concept and theory, they are just spending longer intervals baking a more superior foundational model

so they get lapped a few times and then drop a fantastic new model out of nowhere

the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc

they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options

aswegs8 4 days ago | parent | prev | next [-]

Not sure why they just not replicate the workflow that nano banana pro uses. It lets the thinking model generate a detailed description and then renders that image. When I use ChatGPT thinking model and render an image I also get pretty good results. It's not as creative or flexible as nano banana pro, but it produces really useful results.

louiereederson 5 days ago | parent | prev | next [-]

i think the most important part of google vs openai is slowing usage of consumer LLMs. people focus on gemini's growth, but overall LLM MAUs and time spent is stabilizing. in aggregate it looks like a complete s-curve. you can kind of see it in the table in the link below but more obvious when you have the sensortower data for both MAUs and time spent.

the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.

https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...

random9749832 5 days ago | parent | prev | next [-]

This is obviously trained on Pro 3 outputs for benchmaxxing.

CuriouslyC 5 days ago | parent | next [-]

Not trained on pro, distilled from it.

viraptor 5 days ago | parent [-]

What do you think distilled means...?

CuriouslyC 5 days ago | parent [-]

It's good to keep the language clear, because you could pretrain/sft on outputs (as many labs do), which is not the same thing.

NitpickLawyer 5 days ago | parent | prev [-]

> for benchmaxxing.

Out of all the big4 labs, google is the last I'd suspect of benchmaxxing. Their models have generally underbenched and overdelivered in real world tasks, for me, ever since 2.5 pro came out.

encroach 5 days ago | parent | prev | next [-]

OAI's latest image model outperforms Google's in LMArena in both image generation and image editing. So even though some people may prefer nano banana pro in their own anecdotal tests, the average person prefers GPT image 1.5 in blind evaluations.

https://lmarena.ai/leaderboard/text-to-image

https://lmarena.ai/leaderboard/image-edit

Obertr 5 days ago | parent [-]

Add This to Gemini distribution which is being adcertised by Google in all of their products, and average Joe will pick the sneakers at the shelf near the checkout rather than healthier option in the back

gdhkgdhkvff 5 days ago | parent | next [-]

Those darn sneakers are just too delicious!

encroach 5 days ago | parent | prev | next [-]

That's not how the arena works. The evaluation is blind so Google's advertising/integration has no effect on the results.

Obertr 5 days ago | parent [-]

3 points, sure

encroach 5 days ago | parent [-]

Right, it only scores 3 points higher on image edit, which is within the margin of error. But on image generation, it scores a significant 29 points higher.

raincole 5 days ago | parent | prev [-]

...and what does this have to do with the comment you replied to? Did you reply to the wrong person or you were just stating unrelated factoids?

nightski 5 days ago | parent | prev [-]

Google has incredible tech. The problem is and always has been their products. Not only are they generally designed to be anti-consumer, but they go out of their way to make it as hard as possible. The debacle with Antigravity exfiltrating data is just one of countless.

novok 5 days ago | parent [-]

The Antigravity case feels like a pure bug and them rushing to market. They had a bunch of other bugs showing that. That is not anti-consumer or making it difficult.

5 days ago | parent [-]
[deleted]