I've been extremely impressed (and actually had quite a good time) with GPT-5 and Codex so far. It seems to handle long context well, does a great job researching the code, never leaves things half-done (with long tasks it may leave some steps for later, but it never does 50% of a step and then just randomly mock a function like Gemini used to), and gives me good suggestions if I'm trying to do something I shouldn't. And the Codex CLI also seems to be getting constant, meaningful updates.

▲

mmaunder a day ago | parent | next [-]

Agreed. We're hardcore Claude Code users and my CC usage trended down to zero pretty quickly after I started using Codex. The new model updates today are great. Very well done OpenAI team!! CC was an existential threat. You responded and absolutely killed it. Your move Anthropic.

▲

Jcampuzano2 a day ago | parent | next [-]

To be fair, Anthropic kinda did this to themselves. I consider it as a pretty massive throw on their end in terms of the fairly tight grasp they had on developer sentiment.

Everyone else slowly caught up and/or surpassed them while they simultaneously had quality control issues and service degradation plaguing their system - ALL while having the most expensive models comparatively in terms of intelligence.

▲

mmaunder a day ago | parent | next [-]

Agreed. I really wish Google would get their act together because I think they have the potential of being faster, cheaper with bigger context windows. They're so great at hardcore science and engineering, but they absolutely suck at products.

▲

bobbylarrybobby 16 hours ago | parent | next [-]

Google can do anything but get their act together.

▲

bjackman 21 hours ago | parent | prev | next [-]

I think this is being downvoted coz it doesn't seem to be really responding to the thread, and maybe it isn't, but for anyone who hasn't tried Gemini CLI:

My experience after a month or so of heavy use is exactly this. The AI is rock solid. I'm pretty consistently impressed with its ability to derive insights from the code, when it works. But the client is flaky, the backend is flaky, and the overall experience for me is always "I wish I could just use Claude".

Say 1 in 10 queries craps out (often the client OOMs even though I have 192Gb of RAM). Sounds like a 10% reliability issue but actually it just pushes me into "fuck this I'll just do it myself" so it knocks out like 50% of the value of the product.

(Still, I wouldn't be surprised if this can be fixed over the next few months, it could easily be very competitive IMO).

▲

macNchz 18 hours ago | parent | next [-]

I have been heavily using the Gemini API via Aider for a few months and it has been absolutely stable. Claude, in comparison, has been much flakier. OpenAI somewhere in between.

	▲	bjackman 11 hours ago \| parent [-]
		It's definitely possible there's a "grass is always greener" effect going on here, to be fair. None of these tools give the impression of being well-tested software. My guess is that neither OpenAI nor Anthropic actually has the necessary density in expertise to build quality software. Google obviously can build good software _when it really wants to_ but in this space its strategy looks like "build the products the other guys are building, cut whatever corners necessary to do this absolutely as fast as possible". So even if my initial impressions are more accurate it's quite possible Google wins long term here.

▲

faxmeyourcode 21 hours ago | parent | prev | next [-]

Semi-related but I have the same experience with the gemini mobile app on android. ChatGPT and Claude are both great user experiences and the best word to describe how the gemini app feels is flaky.

▲

dumpsterdiver 17 hours ago | parent | prev [-]

Just adding my two cents after test driving Gemini Ultra after being a long time ChatGPT Pro subscriber:

Remember the whole “Taken 3 makes Taken 2 look like Taken 1” meme? Well Google’s latest video generating AI makes any video gen AI I’ve seen up until now look like Taken 3* (sigh, I said 1, ruined it) - and they are seriously impressive on their own.

Edit: By “they” I mean the other video generating AI makes models, not the other Taken movies. I hope Liam Neeson doesn't read HN, because a delivery like that might not make him laugh.

▲

echelon 21 hours ago | parent | prev [-]

I really do not want Google to win anything. They're a giant monopoly across multiple industries. We need a greater balance of power.

Antitrust enforcement has been letting us down for over two decades. If we don't have an oxygenation event, we'll go an entire generation where we only reward tax-collecting, non-innovation capital. That's unhealthy and unfair.

Our career sector has been institutionalized and rewards the 0.001% even as they rest on their laurels and conspire to suppress wages and innovation. There's a reason why centicorns petered out and why the F500 is tech-heavy. It's because big tech is a dragnet that consumes everything it touches - film studios, grocery stores, and God only knows what else it'll assimilate in the unending search for unregulated, cancerous growth.

FAANG's $500k TC is at the expense of hundreds of unicorns making their ICs even wealthier. That money mostly winds up going to institutional investors, where the money sits parked instead of flowing into huge stakes risks and cutthroat competition. That's why a16z and YC want to see increased antitrust regulations.

But it's really bad for consumers too. It's why our smartphones are stagnant taxation banana republics with one of two landlords. Nothing new, yet as tightly controlled an authoritarian state. New ideas can't be tried and can't attain healthy margins.

It's wild that you can own a trademark, but the only way for a consumer to access it is to use a Google browser that defaults to Google search (URLs are scary), where the search results will be gamed by competitors. You can't even own your own brand anymore.

Winning shouldn't be easy. It should be hard. A neverending struggle that rewards consumers.

We need a forest fire to renew the ecosystem.

▲

andai 20 hours ago | parent [-]

Google supposedly claimed to have no moat, but they actually have

- all the users

- all the apps (Google, GMail, YouTube, Docs, Maps...)

- all the books (Google Books)

- all the video (YouTube)

- all the web pages

- custom hardware

It's honestly weird they aren't doing better. Agree that the models are great and the UX is bad all around.

▲

brianjking 16 hours ago | parent | next [-]

Hey now, let's not forget it. They also have:

- all the lobbyists - all the money

▲

LordDragonfang 19 hours ago | parent | prev [-]

Google has been, for at least a decade, making pretty terrible choices that squander developer and power-user goodwill (see: any thread where they announce a new product and one of the top comments will link to killedbygoogle). When you've burnt bridges with your biggest evangelists, adoption by normies slows, and your products appear to stagnate.

Unfortunately, they've been insulated from the consequences of their bad decisions by the fact the money printer (ads) keeps their company afloat and mollifies shareholders. The moment that dries up, they're in trouble.

	▲	echelon 19 hours ago \| parent [-]
		We say this (I admit I would say the same as you), and yet their revenue is $400 billion a year. I don't think they care what we think. They're thriving despite our protests. But yeah, they shouldn't be shielded from antitrust. They have literally everything.

▲

zamalek 19 hours ago | parent | prev [-]

You're absolutely right!

▲

notfromhere 20 hours ago | parent | prev | next [-]

Gpt5 writes clean, simple code and listens to instructions. I went from tons of Claude APi usage to usage to basically none overnight

	▲	ttul 17 hours ago \| parent [-]
		Agreed. GPT’s coding is so much cleaner. Claude tends to ramble and generate unnecessary scaffolding. GPT’s code is artful and minimalist.

▲

epolanski 18 hours ago | parent | prev | next [-]

But how do you use it?

It's super annoying that it doesn't provide a way to approve edits one by one instead it either vibe codes on its own or gives me diffs to copy paste.

Claude code has a much saner "normal mode".

▲

brianjking 16 hours ago | parent [-]

Wait, this wasn't what I was experiencing. Did something change in gpt-5-codex or was that your normal experience?

	▲	epolanski 9 hours ago \| parent [-]
		I asked you how do you use it. Is it via CLI? Is it via extension to an editor? What is your flow?

▲

ttul 17 hours ago | parent | prev [-]

This just goes to show how crucial it was for Anthropic and OpenAI to hire first class product leads. You can’t just pay the AI engineers $100M. Models alone don’t generate revenue.

	▲	dwohnitmok 15 hours ago \| parent \| next [-]
		I got the exact opposite lesson. The parent and grandparent comments seem to be talking about dropping one product for another purely on the strength of the model.
	▲	arthurcolle 16 hours ago \| parent \| prev [-]
		the model is the product

▲

vitorgrs 14 hours ago | parent | prev | next [-]

Gemini seems to be pretty awful as agentic coding. It always finish the task, and when I see the result, it just breaks my code.

Not sure the fault it's "doing bad code", I guess it's just not being good at being agentic. Saw this on Gemini CLI and other tools.

GLM, Kimi, Qwen-Code all behaves better for me.

Probably Gemini 3 will fix this, as Gemini 2.5 Pro it's "old" by now.

	▲	faangguyindia 13 hours ago \| parent [-]
		Gemini CLI is bad, model itself is really good.

▲

robotswantdata 21 hours ago | parent | prev | next [-]

Agreed ditched my Claude code max for the $200 pro ChatGPT.

Gemini cli is too inconsistent, good for documentation tasks. Don’t let it write code for you

▲

icelancer 20 hours ago | parent [-]

Gemini's tool calling being so bad is pretty amazing. Hopefully in the next iteration they fix it, because the model itself is very good.

▲

nowittyusername 17 hours ago | parent | next [-]

This is a recurring theme with Google. Their models are phenomenal but the systems around them are so bad that it degrades the whole experience. Veo3 great model horrible website, and so on...

	▲	brianjking 16 hours ago \| parent [-]
		Their massive increase in token processing since Veo3 and nano banana have been released would say otherwise... Or we're all just used to eating things we don't like and smiling.

▲

robbrulinski 19 hours ago | parent | prev [-]

That has been my experience as well with every Gemini model, ugh!

▲

DanielVZ 17 hours ago | parent | prev | next [-]

Can someone compare it to cursor? So far i see people compare it with Claude code but I’ve had much more success and cost effectiveness with cursor than Claude code

	▲	bionhoward 15 hours ago \| parent [-]
		Doesn’t compare, because Cursor has a privacy mode. Why would anyone want to pay OpenAI or Anthropic to train their bots on your business codebase? You know where that leads? Unemployment!

▲

EnPissant a day ago | parent | prev | next [-]

My experience with Codex / Gpt-5:

- The smartest model I have used. Solves problems better than Opus-4.1.

- It can be lazy. With Claude Code / Opus, once given a problem, it will generally work until completion. Codex will often perform only the first few steps and then ask if I want to continue to do the rest. It does this even if I tell it to not stop until completion.

- I have seen severe degradation near max context. For example, I have seen it just repeat the next steps every time I tell it to continue and I have to manually compact.

I'm not sure if the problems are Gpt-5 or Codex. I suspect a better Codex could resolve them.

▲

brookst a day ago | parent | next [-]

Claude seems to have gotten worse for me, with both that kind of laziness and a new pattern where it will write the test, write the code, run the test, and then declare that the test is working perfectly but there are problems in the (new) code that need to be fixed.

Very frustrating, and happening more often.

▲

elliot07 a day ago | parent [-]

They for sure nerfed it within the last ~3 weeks. There's a measurable difference in quality.

	▲	conception a day ago \| parent [-]
		They actually just had a bug fix and it seems like it recently got a lot better in the last week or so

▲

M4v3R a day ago | parent | prev | next [-]

Context degradation is a real problem with all frontier LLMs. As a rule of thumb I try to never exceed 50% of available context window when working with either Claude Sonnet 4 or GPT-5 since the quality drops really fast from there.

▲

darkteflon 21 hours ago | parent | next [-]

Agreed, and judicious use of subagents to prevent pollution of the main thread is another good mitigant.

▲

faangguyindia 13 hours ago | parent | prev | next [-]

I cap my context at 50k tokens.

▲

EnPissant a day ago | parent | prev [-]

I've never seen that level of extreme degradation (just making a small random change and repeating the same next steps infinitely) on Claude Code. Maybe Claude Code is more aggressive about auto compaction. I don't think Codex even compacts without /compact.

▲

Jcampuzano2 a day ago | parent [-]

I think some of it is not necessarily auto compaction but the tooling built in. For example claude code itself very frequently builds in to remind the model what its working on and should be doing which helps always keeps its tasks in the most recent context, and overall has some pretty serious thought put into its system prompt and tooling.

But they have suffered quite a lot of degradation and quality issues recently.

To be honest unless Anthropic does something very impactful sometime soon I think they're losing their moat they had with developers as more and more jump to codex and other tools. They kind of massively threw their lead imo.

	▲	EnPissant a day ago \| parent [-]
		Yeah, I think you are right.

▲

apigalore 16 hours ago | parent | prev | next [-]

Yes, this is the one thing stopping me from going to Codex completely. Currently, it's kind of annoying that Codex stops often and asks me what to do, and I just reply "continue". Even though I already gave it a checklist.

With GPT‑5-Codex they do write: "During testing, we've seen GPT‑5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation." https://openai.com/index/introducing-upgrades-to-codex/

▲

bayesianbot a day ago | parent | prev | next [-]

I definitely agree with all of those points. I just really prefer it completing steps and asking me if we should continue to next step rather than doing half of the step and telling me it's done. And the context degradation seems quite random - sometimes it hits way earlier, sometimes we go through crazy amount of tokens and it all works out.

▲

tanvach a day ago | parent | prev [-]

I also noticed the laziness compared to Sonnet models but now I feel it’s a good feature. Sonnet models, now I realize, are way too eager to hammer out code with way more likelihood of bugs.

▲

mritchie712 a day ago | parent | prev | next [-]

Have you used Claude Code? How does it compare?

▲

mmaunder a day ago | parent [-]

It's objectively a big improvement over Claude Code. I'm rooting for anthropic, but they better make a big move or this will kill CC.

▲

mike_hearn 9 hours ago | parent | next [-]

Are you talking about Codex CLI or their GitHub integration?

GPT-5 is a great model. I tried Codex CLI Rust, as they seem to be deprecating the JS version, and it is awful. I don't know what possessed them to try and write a TUI in Rust but it isn't working. The Claude Code UI is hugely superior.

▲

nightshift1 21 hours ago | parent | prev [-]

What are the usage limits like compared to Claude Code? Is it more like 5× or 20×? For twice the price, it would have to be very good.

▲

naiv 20 hours ago | parent [-]

https://help.openai.com/en/articles/11369540-using-codex-wit...

have to say not sure what this even means and what the exact definition of a message is in this context.

with claude code max20 I was constantly hitting limits, with codex not once yet

	▲	mmaunder 19 hours ago \| parent [-]
		Same. We're not hitting limits at all with Codex and it's ridiculously good at managing and preserving its context window while getting a metric fuckton of work done. It's kind of unbelievable actually. I don't know re billing. Not my dept.

▲

troupo 11 hours ago | parent | prev | next [-]

> then just randomly mock a function like Gemini used to

Claude Code does that on longer tasks.

Time to give Codex a try I guess.

▲

FergusArgyll a day ago | parent | prev | next [-]

It doesn't seem to have any internal tools it can use. For example, web search; It just runs curl in the terminal. Compared to Gemini CLI that's rough but it does handle pasting much better... Maybe I'm just using both wrong...

▲

Tiberium 21 hours ago | parent | next [-]

It does have web search - it's just not enabled by default. You can enable it with --search or in the config, then it can absolutely search, for example finding manuals/algorithms.

	▲	FergusArgyll 7 hours ago \| parent [-]
		Thanks!

▲

gizmodo59 a day ago | parent | prev | next [-]

Use --search option when you start codex

	▲	FergusArgyll 7 hours ago \| parent [-]
		Thanks!

▲

ollybee a day ago | parent | prev [-]

web search too is off by default

▲

catlover76 a day ago | parent | prev [-]

[dead]