The benchmarks are very impressive. Codex and Opus 4.5 are really good coders already and they keep getting better.

No wall yet and I think we might have crossed the threshold of models being as good or better than most engineers already.

GDPval will be an interesting benchmark and I'll happily use the new model to test spreadsheet (and other office work) capabilities. If they can going like this just a little bit further, much of the office workers will stop being useful.... I don't know yet how to feel about this.

Great for humanity probably but but for the individuals?

▲

llmslave 2 days ago | parent | next [-]

Yeah theres no wall on this. It will be able to mimic all of human behavior given proper data.

▲

sheeshe 2 days ago | parent | prev | next [-]

Ok so why isn’t there mass lay offs ensuing right now?

	▲	ghosty141 2 days ago \| parent [-]
		Because from my experience using codex in a decently complex c++ environment at work, it works REALLY well when it has things to copy. Refactorings, documentation, code review etc. all work great. But those things only help actual humans and they also take time. I estimate that in a good case I save ~50% of time, in a bad case it's negative and costs time. But what I generally found, it's not that great at writing new code. Obviously an LLM can't think and you notice that quite quickly, it doesn't create abstractions, use abstractions or try to find general solution to problems. People who get replaced by Codex are those who do repetitive tasks in a well understood field. For example, making basic websites, very simple crud applications etc.. I think it's also not layoffs but rather companies will hire less freelancers or people to manage small IT projects.

▲

ionwake 2 days ago | parent | prev [-]

it was only about 2-3 weeks when several HNers told me "nah you better re-check your code", when I explained I have over 2 decades xp of coding, yet have not manually edited code (in memory) for the last 6 or so months, whilst performing daily 12 hour daily vibe code seshes

▲

ipsum2 2 days ago | parent | next [-]

It really depends on the complexity of code. I've found models (codex-5.1-max, opus 4.5) to be absolutely useless writing shaders or ML training code, but really good at basic web development.

▲

nineteen999 2 days ago | parent | next [-]

Interesting, I've been using Claude Max with UE5 and while it isn't _brilliant_ with shaders I can usually get it to where I want. Also had a bit of success with converting HLSL shaders to GLSL with it.

	▲	ipsum2 a day ago \| parent [-]
		I've asked it to write some non-trivial three.js code and have not gotten it to succeed.

▲

sheeshe 2 days ago | parent | prev [-]

Which is no surprise as the data for web development stuff exists in large amounts on the web that the models feed off.

▲

osn9363739 2 days ago | parent | prev [-]

Do you have any examples or are your project oss or anything like that? Because I want to believe, but I have people I work with that say and try the same thing (no manual coding), and their work is now terrible.