new | show | ask | jobs Github

aresant 3 days ago

Feels like a mixed bag vs regression?

eg - GPT-5 beats GPT-4 on factual recall + reasoning (HeadQA, Medbullets, MedCalc).

But then slips on structured queries (EHRSQL), fairness (RaceBias), evidence QA (PubMedQA).

Hallucination resistance better but only modestly.

Latency seems uneven (maybe more testing?) faster on long tasks, slower on short ones.

▲

TrainedMonkey 3 days ago | parent | next [-]

GPT-5 feels like cost engineering. The model is incrementally better, but they are optimizing for least amount of compute. I am guessing investors love that.

▲

narrator 3 days ago | parent | next [-]

I agree. I have found GPT-5 significantly worse on medical queries. It feels like it skips important details and is much worse than o3, IMHO. I have heard good things about GPT-5 Pro, but that's not cheap.

I wonder if part of the degraded performance is where they think you're going into a dangerous area and they get more and more vague, for example like they demoed on launch day with the fireworks example. It gets very vague when talking about non-abusable prescription drugs for example. I wonder if that sort of nerfing gradient is affecting medical queries.

After seeing some painfully bad results, I'm currently using Grok4 for medical queries with a lot of success.

▲

fertrevino 3 days ago | parent | next [-]

Interesting, it seems the anecdotal experience agrees with the benchmark results.

▲

rbinv 3 days ago | parent | prev [-]

Afaik, there is currently no "GPT-5 Pro". Did you mean o3-pro or o1-pro (via API)?

Currently, GPT-5 sits at $10/1M output tokens, o3-pro at $80, and o1-pro at a whopping $600: https://platform.openai.com/docs/pricing

Of course this is not indicative of actual performance or quality per $ spent, but according to my own testing, their performance does seem to scale in line with their cost.

▲

mastercheif 3 days ago | parent | next [-]

GPT-5 Pro is only available on ChatGPT with a ChatGPT Pro subscription.

Supposedly it fires off multiple parallel thinking chains and then essentially debates with itself to net a final answer.

▲

yzydserd 3 days ago | parent | prev [-]

O5-pro is available through the ChatGPT UI with a “Pro” plan. I understand that like o3 pro it is a high compute large context invocation of underlying models.

	▲	rbinv 3 days ago \| parent [-]
		Thanks, I was not aware! I thought they offered all their models via their API.

▲

RestartKernel 3 days ago | parent | prev | next [-]

I wonder how that math works out. GPT-5 keeps triggering a thinking flow even for relatively simple queries, so each token must be a magnitude cheaper to make this worth the trade-off in performance.

▲

JimDabell 3 days ago | parent | prev | next [-]

I’ve found that it’s super likely to get stuck repeating the exact same incorrect response over and over. It used to happen occasionally with older models, but it happens frequently now.

Things like:

Me: Is this thing you claim documented? Where in the documentation does it say this?

GPT: Here’s a long-winded assertion that what I said before was correct, plus a link to an unofficial source that doesn’t back me up.

Me: That’s not official documentation and it doesn’t say what you claim. Find me the official word on the matter.

GPT: Exact same response, word-for-word.

Me: You are repeating yourself. Do not repeat what you said before. Here’s the official documentation: [link]. Find me the part where it says this. Do not consider any other source.

GPT: Exact same response, word-for-word.

Me: Here are some random words to test if you are listening to me: foo, bar, baz.

GPT: Exact same response, word-for-word.

It’s so repetitive I wonder if it’s an engineering fault, because it’s weird that the model would be so consistent in its responses regardless of the input. Once it gets stuck, it doesn’t matter what I enter, it just keeps saying the same thing over and over.

▲

namibj 3 days ago | parent | next [-]

Go back and edit a prompt of yours in the conversation instead of continuing with garbage in the context.

	▲	ForHackernews 3 days ago \| parent [-]
		That's a good tip, I didn't know you could do that.

▲

slashdev 3 days ago | parent | prev | next [-]

If one conversation goes in a bad direction, it's often best to just start over. The bad context often poisons the existing session.

▲

TrainedMonkey 3 days ago | parent | prev [-]

That sounds like query caching... which would also align with cost engineering angle.

▲

UltraSane 3 days ago | parent | prev | next [-]

Since the routing is opaque they can dynamically route queries to cheaper models when demand is high.

▲

yieldcrv 3 days ago | parent | prev [-]

Yeah look at their open source models and how you get such high parameters in such low vram

Its impressive but a regression for now, in direct comparison to just high parameter model

▲

woeirua 3 days ago | parent | prev | next [-]

Definitely seems like GPT5 is a very incremental improvement. Not what you’d expect if AGI were imminent.

	▲	p1esk 2 days ago \| parent [-]
		What would you expect?

▲

fertrevino 3 days ago | parent | prev [-]

Mixed results indeed. While it leads the benchmark in two question types, it falls short in others which results in the overall slight regression.