Remix.run Logo
brianyu8 9 hours ago

I am super bullish on claude code / codex cli + LSP and other deterministic codemod and code intelligence tools.

I was playing around with codex this weekend and honestly having a great time (my opinion of it has 180'd since gpt-5.2(-codex) came out) but I was getting annoyed at it because it kept missing references when I asked it to rename or move symbols. So I built a skill that teaches it to use rope for mechanical python codebase refactors: https://github.com/brian-yu/python-rope-refactor

Been pretty happy with it so far!

lionkor 5 hours ago | parent | next [-]

OpenAI engineer fails to rename references because his F2 key has been replaced with the Copilot button?

No LSP support is wild.

shimman 5 hours ago | parent [-]

This is something I notice often when using these tools (if this is what you are referring too). Like they will grep entire code bases to search for a word rather than search by symbol. I suppose they don't care to fix these types of things as it all adds up to paid tokens in the end.

We have 50 years worth of progress on top of grep and grep is one of the worse ways to refactor a system.

Nice to see LLM companies are ignoring these teachings and speed running into disaster.

shepherdjerred 9 hours ago | parent | prev | next [-]

Are you having a positive experience with Codex compared to Claude Code? Codex in my brief experience was... not good w/ 5.1

cube2222 9 hours ago | parent | next [-]

Just to provide another datapoint - tried codex September / October after seeing the glowing reviews here, and it was, all in all, a huge letdown.

It seems to be very efficient context-wise, but at the same time made precise context-management much harder.

Opus 4.5 is quite a magnificent improvement over Sonnet 4.5, in CC, though.

Re tfa - I accidentally discovered the new lsp support 2 days ago on a side project in rust, and it’s working very well.

fluidcruft 3 hours ago | parent | next [-]

Similar experience and timeline with codex, but tried it last week and it's gotten much better in the interim. Codex with 5.2 does a good job at catching (numerical) bugs that Opus misses. I've been comparing them and there's not a clear winner, GPT 5.2 misses things Opus finds and vice versa. But claude-code is still a much better experience and continues to just keep getting better but codex is following, just a few months behind.

allisdust 9 hours ago | parent | prev [-]

Another anecdote/datapoint. Same experience. It seem to mask a lot of bad model issues by not talking much and overthinking stuff. The experience turns sour the more one works with it.

And yes +1 for opus. Anthropic delivered a winner after fucking up the previous opus 4.1 release.

theshrike79 6 hours ago | parent | prev | next [-]

It goes like this:

Codex is an outsourcing company, you give specs, they give you results. No communication in between. It's very good at larger analysis tasks (code coverage, health etc). Whatever it does, it does it sloooowwwllyyy.

Claude is like a pair programmer, you can follow what it's doing, interrupt and redirect it if it starts going off track. It's very much geared towards "get it done" rather than maximum code quality.

aschobel 5 hours ago | parent | prev | next [-]

I’m basically only using the Codex CLI now. I switched around the GPT-5 timeframe because it was reliably solving some gnarly OpenTelemetry problems that Claude Code kept getting stuck on.

They feel like different coworker archetypes. Codex often does better end-to-end (plan + code in one pass). Claude Code can be less consistent on the planning step, but once you give it a solid plan it’s stellar at implementation.

I probably do better with Codex mostly due to familiarity; I’ve learned how it “thinks” and how to prompt it effectively. Opus 4.5 felt awkward for me for the same reason: I’m used to the GPT-5.x / Codex interaction style. Co-workers are the inverse, they adore Opus 4.5 and feel Codex is weird.

__mharrison__ 4 hours ago | parent | prev [-]

I've gone it works wonderful for 5.2. I think chatgpt plus is at the top of the weekly AI rolling wars. Most bang for the buck.

frays 9 hours ago | parent | prev [-]

Interesting to see that you work at OpenAI but had to build a skill like this yourself.

Surprised that you don't have internal tools or skills that could do this already!

Shows how much more work there is still to be done in this space.

voiper1 8 hours ago | parent | next [-]

My theory is that even if the models are frozen here, we'll still spend a decade building out all the tooling, connections, skills, etc and getting it into each industry. There's so much _around_ the models that we're still working on too.

nonethewiser 27 minutes ago | parent [-]

Agree completely. It's already been like this for 1-2 years even. Things are finally starting to get baked in but its still early. For example, AI summaries of product reviews, gemini youtube video summaries, etc..

Its hard to quantify what sort of value those examples generate (youtube and amazon were already massively popular). Personally I find it very useful, but it's still hard to quantify. It's not exactly automating a whole class of jobs, although there are several youtube transcription services that this may make obsoete.

NitpickLawyer 8 hours ago | parent | prev | next [-]

> Shows how much more work there is still to be done in this space.

This is why I roll my eyes every time I read doomer content that mentions an AI bubble followed by an AI winter. Even if (and objectively there's 0 chance of this happening anytime soon) everyone stops developing models tomorrow, we'll still have 5+ years of finding out how to extract every bit of value from the current models.

agumonkey 6 hours ago | parent | next [-]

One thing though, if the slowdown is too abrupt, it might forbid openai, anthropic etc to keep financially running datacenters for us to use.

imiric 8 hours ago | parent | prev [-]

The idea that this technology isn't useful is as ignorant as thinking that there is no "AI" bubble.

Of course there is a bubble. We can see it whenever these companies tell us this tech is going to cure diseases, end world hunger, and bring global prosperity; whenever they tell us it's "thinking", can "learn skills", or is "intelligent", for that matter. Companies will absolutely devalue and the market will crash when the public stops buying the snake oil they're being sold.

But at the same time, a probabilistic pattern recognition and generation model can indeed be very useful in many industries. Many of our problems can be approached by framing them in terms of statistics, and throwing data and compute at them.

So now that we've established that, and we're reaching diminishing returns of scaling up, the only logical path forward is to do some classical engineering work, which has been neglected for the past 5+ years. This is why we're seeing the bulk of gains from things like MCP and, now, "agents".

NitpickLawyer 7 hours ago | parent [-]

> This is why we're seeing the bulk of gains from things like MCP and, now, "agents".

This is objectively not true. The models have improved a ton (with data from "tools" and "agentic loops", but it's still the models that become more capable).

Check out [1] a 100 LoC "LLM in a loop with just terminal access", it is now above last year's heavily harnessed SotA.

> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!

[1] - https://github.com/SWE-agent/mini-swe-agent

imiric 7 hours ago | parent [-]

I don't understand. You're highlighting a project that implements an "agent" as a counterargument to my claim that the bulk of improvements are from "agents"?

Sure, the models themselves have improved, but not by the same margins from a couple of years ago. E.g. the jump from GPT-3 to GPT-4 was far greater than the jump from GPT-4 to GPT-5. Currently we're seeing moderate improvements between each release, with "agents" taking up center stage. Only corporations like Google are still able to squeeze value out of hyperscale, while everyone else is more focused on engineering.

IanCal 6 hours ago | parent [-]

I think the point here is that it’s not adding agents on top but the improvements in the models allow the agentic flow.

emp17344 30 minutes ago | parent [-]

But that’s not true, and the linked agentic design is not a counterargument to the poster above. The LLM is a small part of the agentic system.

shermantanktop 8 hours ago | parent | prev | next [-]

Cobbler’s children…

Aiisnotabubble 8 hours ago | parent | prev [-]

[dead]