Remix.run Logo
wincy 4 hours ago

Just tried it out for a prod issue was experiencing. Claude never does this sort of thing, I had it write an update statement after doing some troubleshooting, and I said “okay let’s write this in a transaction with a rollback” and GPT-5.5 gave me the old “okay,

BEGIN TRAN;

-- put the query here

commit;

I feel like I haven’t had to prod a model to actually do what I told it to in awhile so that was a shock. I guess that it does use fewer tokens that way, just annoying when I’m paying for the “cutting edge” model to have it be lazy on me like that.

This is in Cursor the model popped up and so I tried it out from the model selector.

XCSme 3 hours ago | parent | next [-]

I feel like the last 2-3 generations of models (after gpt-5.3-codex) didn't really improve much, just changed stuff around and making different tradeoffs.

pixel_popping 3 hours ago | parent [-]

I disagree, it improved enormously especially at staying consistent for long-tasks, I have a task running for 32 days (400M+ tokens) via Codex and that's only since gpt-5.4

ericpauley 3 hours ago | parent | next [-]

Has that task accomplished anything yet?

elAhmo a minute ago | parent | next [-]

It made Sam richer.

codemog 3 hours ago | parent | prev | next [-]

I think the OP is in for a rude surprise when the task is “finished”.

hagbard_c 2 hours ago | parent [-]

It will go somewhat like this:

“You're really not going to like it," observed Codex.

"Tell us!"

"All right, said Codex. "The answer to your Great Question..."

"Yes...!"

"Is..." said Codex, and paused.

"Yes...!"

"Is..."

"Yes...!!!...?"

"Forty-two," said Codex, with infinite majesty and calm.

pixel_popping 2 hours ago | parent [-]

I bet you've asked Codex for that joke :p

xp84 3 hours ago | parent | prev | next [-]

Too soon to tell, give it a billion tokens before we make up our minds

pixel_popping 3 hours ago | parent [-]

Oh boy, you are far from what it requires, we are probably talking 3B+, but note that this is just codex, obviously codex is also doing automatic adversarial with the regular zoo (gemini-3.1-pro-preview, opus-4.6/4.7, gpt-5.3-codex, minimax-2.7, glm-5.1, mimo-2 (now 2.5) and so-on, you get the gist) :)

fl4regun 3 hours ago | parent [-]

what is that task doing???

SecretDreams 3 hours ago | parent | prev [-]

Kept the OP employed for a full extra month at their high AI metric firm, hopefully.

pixel_popping an hour ago | parent [-]

Just making Jensen proud is all.

lowdude 3 hours ago | parent | prev | next [-]

That’s actually crazy, what kind of task is that? And is that a recurring kind of task like some analysis, or coding related?

pixel_popping 3 hours ago | parent [-]

Coding (along with docs, tests obviously), rewriting a huge chunk of the KVM hypervisor (in Kernel 7, started in the -rc2) and KSM and other modules, can't say too much about it yet (might do an announcement in coming weeks). The coding is automated but the plan took days of manual arguing (with all models possible) prior (while doing other things during waiting times as I currently manage 70 repos for an upcoming release of our Beta).

I think users really underestimate the capabilities of "AI" when using the right tooling/combinations of models and procedures (and loops), that's talking with 2 decades of dev behind me, genuinely I'm not on phase with people saying it produces slop of any kind, at this stage, it's mostly the fault of the prompter (or the prompter not having enough tokens to do mass adversarial), but clearly, I can genuinely state that the code produced is overall the SAME quality as I would by being extremely meticulous.

I'm like a bot following 30+ threads concurrently, sometimes it's fun, sometimes it feels like playing casino, sometimes it's boring, but this is truly an insane era if you have the funding for it, obviously we stack many MANY accounts in rotation 24/7, equivalent in API cost by myself is about 100K$+ (a month) but we pay only a fraction of that cost thanks to the plans.

PS: I have 8 monitors in front of me to manage all that (portable monitors stacked together).

Urahandystar 2 hours ago | parent | next [-]

Please do an update when you're ready, this sounds like madness to me so I'd love to see what the output is. Whatever it is I have to know.

ericreg92 an hour ago | parent | prev | next [-]

Please do a post about this (though I realize that takes time). This sounds amazing. I have always dreamed of doing this too but just don't have the budget.

7thpower 11 minutes ago | parent | prev | next [-]

I have yet to talk to someone who is taking this approach and doesn’t end up with a dumpster fire, but here is to hoping this time is different.

Hope it works and you post about it.

jamwil 35 minutes ago | parent | prev [-]

I’m vague on a specific reason for this feeling because there are a few to choose from and no one overpowers the other, but the emotion that comes to mind when I read this is disgust. As a society I feel we will look back on the subsidized opulence of this moment with total and utter contempt.

holmesworcester 22 minutes ago | parent [-]

Or nostalgia for simpler times

jamwil 6 minutes ago | parent [-]

That as well. But everyone reading GP’s posts knows in their bones that it’s unsustainable. It’s economically unsustainable and environmentally unsustainable, and in that context it strikes me as pure hoarding behaviour. Taking as much as they can for themselves before the house of cards crashes down.

I have no sympathy for OpenAI or Anthropic as corporations, but if these are the new tools of the trade, then platform abuse like GP is bragging about serves only to destroy the livelihoods of the rest of us who are content to use our fair share.

There’s no such thing as a free lunch, and the bill always comes at the end.

r_lee 3 hours ago | parent | prev [-]

...what? what kind of a task are you running?

endymi0n 3 hours ago | parent | prev | next [-]

OpenAI is the first company that has reached a level of intelligence so high, the model has finally become smart enough to make YOU do all the work. Emergent behavior in action.

All earnesty aside, OpenAI’s oddly specific singular focus on “intelligence per token” (also in the benchmarks) that literally noone else pushes so hard eerily reminds me of Apple’s Macbook anorexia era pre-M1. One metric to chase at the cost of literally anything else. GPT-5.3+ are some of the smartest models out there and could be a pleasure to work with, if they weren’t lazy bastards to the point of being completely infuriating.

syspec 3 hours ago | parent | prev | next [-]

Can't tell if above is good or bad.

hbn 3 hours ago | parent | prev [-]

GPT-5.5 shatters benchmarks for amount of faith it puts in the user.