Remix.run Logo
virgildotcodes a day ago

I really wish local models could compete with Codex, but they are miles apart for now. I'm not sure how they would ever not be, unless local models at some point in the future catch up to the current state of 5.4 high.

Even then, the frontier models would likely have improved by an equivalent degree, so you'd again be faced with the same choice of deciding between a dramatically less effective local tool and a far more capable, closed remote model.

I guess there's going to be some point of "good enough" for most people.

I feel like the closed frontier models really got there around 8 months ago and then even moreso ~4-6 months ago with the release of the Codex series and then opus 4.6. Finally feels like you can get reliably good implementations of features that follow repo patterns and best practices, and at least with 5.4 High/Xhigh Codex, code reviews that don't mostly surface hallucinated or superficial bullshit.

While I'm rambling, I feel like when/if local models ever do catch up to this point, the frontier models are going to be so damn good that software devs are truly fucked.

lrvick 14 hours ago | parent [-]

I do linux kernel, compiler, and operating system dev with Qwen3.5 122b running locally on a Strix Halo 128G ad 35t/s. Pretty much the most complex software problems one can work on.

I think a lot of people just want to put in a credit card and press an easy button.

virgildotcodes 9 hours ago | parent [-]

Yeah the easy button, if translated to a more capable model that requires less hand holding, manual correction and consistently produces better quality code, is of course the point. You wouldn't want to go from Qwen3.5 122b back to GPT 3.5 for coding assistance.

People can definitely be productive with less powerful models. Supermaven or Cursor's tab autocomplete models from a year ago were already a huge boost over the pre-AI days. They just don't have the same capabilities as the leading models.

Curious if you've tried Gpt 5.4 High through Codex to compare for your use case?