My chats so far have been similar to yours, across the board worse than o3, never better. I've had cases where it completely misinterpreted what I was asking for, a very strange experience which I'd never had with the other frontier models (o3, Sonnet, Gemini Pro). Those would of course get things wrong, make mistakes, but never completely misunderstand what I'm asking. I tried the same prompt on Sonnet and Gemini and both understood correctly.

It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.

▲

faizshah 5 days ago | parent | next [-]

> I've had cases where it completely misinterpreted what I was asking for, a very strange experience which I'd never had with the other frontier models (o3, Sonnet, Gemini Pro).

Yes! This exactly, with o3 you could ask your question imprecisely or word it badly/ambiguously and it would figure out what you meant, with GPT5 I have had several cases just in the last few hours where it misunderstands the question and requires refinement.

> It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.

For me I was using o3 in daily life like yesterday we were playing a board game so I wanted to ask GPT5 Thinking to clarify a rule, I used the ambiguous prompt with a picture of a card’s draw 1 card power and asked “Is this from the deck or both?” (From the deck or from the board). It responded by saying the card I took a picture of was from the game wingspan’s deck instead of clarifying the actual power on the card (o3 would never).

I’m not looking forward to how much time this will waste on my weekend coding projects this weekend.

▲

jjani 5 days ago | parent [-]

It appears to be overtuned on extremy strict instruction following, interpreting things in a very unhuman way, which may be a benefit to agentic tasks at the costs of everything else.

My limited API testing with gpt-5 also showed this. As an example, the instruction "don't use academic language" caused it to basically omit half of what it output without that instruction. The other frontier models, and even open source Chinese ones like Kimi and Deepseek, understand perfectly fine what we mean by it.

	▲	int_19h 5 days ago \| parent [-]
		It's not great at agentic tasks either. Not the least because it seems very timid about doing things on its own, and demands (not asks - demands) that user confirm every tiny step.

▲

SomewhatLikely 5 days ago | parent | prev [-]

The default outputs are considerably shorter even in thinking mode. Something that helped me get the thinking mode back to an acceptable state was to switch to the Nerd personality and in the traits customization setting tell it to be complete and add extra relevant details. With those additions it compares favorably to o3 on my recent chat history and even improved some cases. I prefer to scan a longer output than have the LLM guess what to omit. But I know many people have complained about verbosity so I can understand why they may have moved to less verbiage.