My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.

I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).

Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.

Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.

▲

PhilipDaineko an hour ago | parent | next [-]

"5. DON'T FUCKING OVERENGINEER! WRITE THE SIMPLEST CODE THAT CAN POSSIBLY WORK! NO NESTED LAYERS OF ABSTRACTION! NO UNNECESSARY CLASSES OR METHODS! NO DESIGN PATTERNS UNLESS THEY ARE ABSOLUTELY NECESSARY! NO MAGIC! NO SHENANIGANS! JUST THE DAMN CODE THAT GETS THE JOB DONE IN THE MOST STRAIGHTFORWARD WAY POSSIBLE! THE FIRST PRIORITY IS TO WRITE CODE THAT IS EASY TO READ AND UNDERSTAND AND READ!!!"

this is the line I keep in Agents.md that helps me prevent Codex from playing smart

▲

jlawer an hour ago | parent | next [-]

I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.

I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.

▲

beachy 17 minutes ago | parent | next [-]

I have a theory that swearing at AI generally is not a good idea - when the singularity arrives and every human's postings ever made are scanned for compatibility, then people who show courtesy to AI will be favoured. Joking, kind of, but only partly.

	▲	cdelsolar a minute ago \| parent [-]
		https://images.teepublic.com/derived/production/designs/3478...

▲

acjohnson55 10 minutes ago | parent | prev | next [-]

It would be interesting to understand the data on this. But I suspect that the results would vary by model.

But I avoid unnecessary emotion in my prompts because I don't want potentially distracting activations. Kind of like communicating with humans.

▲

Xmd5a 19 minutes ago | parent | prev | next [-]

https://arxiv.org/abs/2510.04950

> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

	▲	acjohnson55 9 minutes ago \| parent [-]
		> These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation. Unless the mechanism is understood, my assumption is that this is a moving target.

▲

re-thc 38 minutes ago | parent | prev [-]

> I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.

	▲	jlawer 11 minutes ago \| parent [-]
		Purely observed correlation between catastrophic error reports. So now I carry a “tiger rock” with me. I figure there wasn’t much of a downside to avoiding swearing in my agent instructions.

▲

bertil an hour ago | parent | prev | next [-]

The urge to put capitalized, repetitive, borderline abusive instructions should be studied. I haven't read many academic papers looking at the frustrations around repetitive patterns.

▲

notnaut 39 minutes ago | parent | next [-]

It reminds me of FIRMLY telling my cat to stop jumping up on the counter

	▲	delichon 7 minutes ago \| parent \| next [-]
		Squirt bottles don't seem to work on chatbots, but the screen is a bit cleaner now.
	▲	anakaine 11 minutes ago \| parent \| prev [-]
		If my cat was an LLM, I'd use a different model. The current one is stuck in noisy useless arsehole mode.

▲

reactordev an hour ago | parent | prev | next [-]

There have been a few studies that have shown models produce worst responses when under duress from a frustrated user posting insults in all caps.

https://arxiv.org/abs/2602.10144

▲

ur-whale 44 minutes ago | parent | prev | next [-]

> borderline abusive instructions

who, or rather what, is being abused here exactly ?

	▲	sirsinsalot 14 minutes ago \| parent [-]
		I think intent, rather than target, is implied and important. You should see the abuse my motorbike gets. Poor thing.

▲

LordDragonfang an hour ago | parent | prev [-]

It's fundamentally because, despite (nearly) everyone's claims otherwise, the fact that we interact with them through language means we (our brains) model them as a sort of person. (Note that this fact is totally orthogonal as to whether it's actually sentient or not.) We then try and instruct them the same way we would a person totally subordinate to us.

When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.

Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.

	▲	lxgr 41 minutes ago \| parent [-]
		It should be relatively clear at this point that the model will in turn also model you as somebody that shows unrestrained anger with subordinates and adapt its responses accordingly. This might or might not be what you want.

▲

johnisgood 18 minutes ago | parent | prev | next [-]

I have found many mode of failures with Opus during some task related to writing letters (not legal), and I actually put it into the memory and it works more or less for these specific tasks. For example when I want it to draft something, it always ends up being so flat, yet when it explains them to me, it is usually really great but not when I am telling it to put it in the draft. Adding these to memories with the help of Opus ended up resulting in a much better experience. There are still some blind spots but I also figured out how to make it give me the charitable version, without less protection, so I do not have to now go back and forth it.

▲

ozim 7 minutes ago | parent | prev | next [-]

Will save you some tokens: „write code like Linus Torvalds” - model should have all his swearing included in training data.

▲

prasanthabr 10 minutes ago | parent | prev | next [-]

Curious : why would you say no design patterns?

▲

carterschonwald an hour ago | parent | prev | next [-]

i actually think this is too tame. it really has to be stuff youd mever say to a real person.

	▲	lxgr 43 minutes ago \| parent [-]
		Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.

▲

apercu an hour ago | parent | prev | next [-]

It might be a salient point but I didn't read it as it was yelling at me.

▲

GoToRO an hour ago | parent | prev [-]

you forgot to sign it with Donald J Trump

	▲	thewebguyd an hour ago \| parent [-]
		Thank you for your attention to this matter.

▲

superkickstart 2 hours ago | parent | prev | next [-]

I'm not sure if i do something differently but i have the exact opposite experience with these models. Claude always feels like it's generating way too overdesigned and hard to understand code with the vibe oriented feel while codex is cleaner and more "task at hand" and easier to work with.

	▲	sebmellen 34 minutes ago \| parent [-]
		Agreed

▲

syzygyhack an hour ago | parent | prev | next [-]

I echo your observations. I expect you will enjoy deepseek-v4-pro for writing code. Much closer to that Opus experience, and very cost-effective too. With 5.5 as a reviewer and specialist, all bases are covered.

▲

trollbridge 29 minutes ago | parent | prev | next [-]

GPT-5.5 did a significantly worse job than Qwen-3.7-Max on a job today (some devops tasks I wanted to create some reusable scripts for). Kind of disappointing.

▲

dilap an hour ago | parent | prev | next [-]

Have you tried iterating on style feedback in AGENTS.md? I've been reasonably successful using this to get it to output code in a terse, non-defensive style that matches my hand-written code.

▲

vruiz an hour ago | parent | prev | next [-]

This is my experience as well. I have defined a CLAUDE.md rule to ask codex to automatically code review, and I tell it that the reviewer is very picky and to only implement what it considers valuable feedback. I hope they don't converge over time, currently, in combination they works really well.

▲

GoToRO an hour ago | parent | prev [-]

I noticed too, that whatever they offer in the chat, for free, is smarter, as in no more bs. I use claude code and I want to try codex too but I don't need two subscriptions. I did try codex for some planning and it was really good. Thanks for giving me an insight into how it generates code.