> I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)

Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.

And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.

▲

haellsigh 7 hours ago | parent | next [-]

I agree with what you're saying. I have a Claude plan for work and I prefer using Claude more than any other LLM I've tried. Having recently tried the Codex 100€ plan with GPT-5.5 in high/xhigh, I don't think it's worse that the Opus models, just different.

I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.

Just my two cents.

	▲	embedding-shape 6 hours ago \| parent [-]
		> I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal. Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things. Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.

▲

WarmWash 5 hours ago | parent | prev | next [-]

People use a model as their daily driver, get very familiar with it and it's behavior, and then go and use another model and have a hard time. It's very difficult to separate "the model is bad" from "the model works differently".

▲

JumpCrisscross 4 hours ago | parent [-]

> It's very difficult to separate "the model is bad" from "the model works differently"

At which point it’s fair to reject the commoditization label.

Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.

▲

embedding-shape 3 hours ago | parent [-]

> Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.

They're missing in the discussion because the ones you can run locally, aren't actually "one step away from other closed-source labs" in practice when you use them. They might benchmark as such, but they're sadly far away from measuring up to those scores except for very specific use cases, even when you have say 96GB of VRAM available to run the bigger models even most (at home) consumers won't be able to run.

▲

JumpCrisscross 3 hours ago | parent [-]

> the ones you can run locally, aren't actually "one step away from other closed-source labs"

And they probably won’t be for at least another decade. Comparing like with like, flagship model running on the best hardware it can run on, Qwen is close.

▲

embedding-shape 3 hours ago | parent [-]

> Qwen is close

I wish so badly this was true, but sadly today it just isn't.

	▲	JumpCrisscross 3 hours ago \| parent [-]
		To be clear, I’m relaying my subjective experience comparing Opus and Qwen.

▲

computerex an hour ago | parent | prev | next [-]

For HPC/ai work opus blows gpt away, it’s no competition.

▲

alecco 7 hours ago | parent | prev [-]

When you say "Pro" territory, do you include Fable?

	▲	embedding-shape 7 hours ago \| parent [-]
		You mean the model that was available for a whole of three days? No, I had played around with it a tiny bit, but not much than that. I guess time will tell if it gets close.