My very brief interaction with GPT5 is that it's just weird.

"Sure, I'll help you stop flirting with OOMs"

"Thought for 27s Yep-..." (this comes out a lot)

"If you still graze OOM at load"

"how far you can push --max-model-len without more OOM drama"

- all this in a prolonged discussion about CUDA and various llm runners. I've added special user instructions to avoid flowery language, but it gets ignored.

EDIT: it also dragged conversation for hours. I ended up going with latest docs and finally, all issues with CUDA in a joint tabbyApi and exllamav2 project cleared up. It just couldn't find a solution and kept proposing, whatever people wrote in similar issues. It's reasoning capabilities are in my eyes greatly exaggarated.

▲

mh- 2 days ago | parent [-]

Turn off the setting that lets it reference chat history; it's under Personalization.

Also take a peek at what's in Memories (which is separate from the above); consider cleaning it up or disabling entirely.

▲

eurekin 2 days ago | parent [-]

Oh, I went through that. o3 had the same memories and was always to the point.

▲

mh- 2 days ago | parent [-]

Yes, but don't miss what I said about the other setting. You can't see what it's using from past conversations, and if you had one or two flippant conversations with it at some point, it can decide to start speaking that way.

	▲	eurekin 2 days ago \| parent [-]
		I have that turned off, but even if, I only use chat for software development