| ▲ | Insanity 2 hours ago | ||||||||||||||||||||||||||||||||||
I wonder how much the 'inflection point' is a thing vs marketing. I'm sure the models got somewhat better, but even now when I'm trying to 'vibe code' a game with the latest models (combination of Codex w/ gpt5.5 and gpt5.3-codex), they really do struggle. They definitely get something barebones up and running, but it's far from a fully fledged application. | |||||||||||||||||||||||||||||||||||
| ▲ | kvakkefly 38 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
I remember this very clearly myself. Before opus 4.5, I was doing a lot of hand holding and was coding a lot myself, but I have not written code since that day more or less. I did write some stuff myself just to learn how the enigma encryption machine worked, so wrote myself to learn. But professionally, I stopped coding in November. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | bluegatty an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Paradox - you can get multiple inflection points even as systems start to have dimishing marginal returns in core capability, I think this is due to 'threshold crossing' where something 'becomes good enough for a specific purpose' - it just unlocks capabilities. 'Nail Guns' used to be heavy, required heavy power cords, they were extremely expensive. When they got lighter, cheaper, battery pack ... at some point, they blend seamlessly into the roofers process, and multiply dramatically the work that can be done. Marginal improvements beyond that may not yield the same 'unlocks' because the threshold has been crossed. | |||||||||||||||||||||||||||||||||||
| ▲ | minimaxir an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Opus 4.5 in November 2025 was legitimately, unironically an inflection point and is the sole reason for the current hysteria. GPT 5.5 is a significant improvement over GPT 5.4 but I wouldn't call it an inflection. | |||||||||||||||||||||||||||||||||||
| ▲ | halflife an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
I feel the change. It went from an autocomplete tool, to an agent running 5 tasks in parallel while I just supervise. The improvement is enormous. | |||||||||||||||||||||||||||||||||||
| ▲ | adgjlsfhk1 an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
It's very real. Just in the past 2 months or so IMO there's been a pretty big improvement in claude for local dev (although I think a lot of that is less model strength and more harness capability). 1m context is a huge difference (~30 min vs 2.5hr between compact significantly increases the scope of what I get the AI to do before it goes stupid). The other biggest difference I've noticed is a better balance of actually doing the work vs pushing back on bad ideas. I want the AI to tell me if it thinks the thing I am telling it is wrong or a bad idea, but if I confirm, I want it to do that anyway. A couple months ago, the claude was a lot more likely to either say "This is too much work I'm not going to do all of it", tell me the idea was genius (and then pretend to do it) or something equally useless. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | xbmcuser an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
It's real for me as a non coder previously uploading a python script asking it to add this function or that function used to break it now usually it just works at least with Claude and Chat Gpt models. Google Gemini still breaks stuff but rumors are their new flash model that will be announced soon is very good. I am usually working with data in csv files and generating spreadsheet pdf etc and the results for that has improved dramatically. | |||||||||||||||||||||||||||||||||||
| ▲ | DeathArrow 34 minutes ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Purely vibe code won't work. You need to define an excellent architecture, have great specs, a solid plan, divide the plan in small phases that fit well in a context window, use TDD and automated code reviews for implementing each phase, do QA and some code review. At any point you need to have agents review, verify and test the other agents output and iterate until the output is perfect. And also, have good e2e tests. IMO, if you don't spend at least a few tens of millions tokens per day, you aren't doing it properly. | |||||||||||||||||||||||||||||||||||