In my current experience:

- o3 is the bestest and my go-to, but its strength comes from it combining reasoning with search - it's the one model you can count on finding things out for you instead of going off vibe and training data;

- GPT 4.5 feels the smartest, but also has tight usage limits and doesn't do search like o3 does; I use it when I need something creative done, or switch to it mid-conversation to have it reason off an already primed context;

- o4-mini / o4-mini-hard - data transformation, coding stuff that doesn't require looking things up - especially when o3 looked stuff up already, and now I just need ChatGPT to apply it into code/diagrams;

- gpt-4o - only for image generation, and begrudgingly when I run out of quota on GPT 4.5

o3 has been my default starting model for months now; most of my queries generally benefit from having a model that does autonomous reasoning+search. Agentic coding stuff, that I push to Claude Code now.

▲

agos 12 hours ago | parent | next [-]

the fact that one needs to know stuff like this and that it changes every three months seriously limits the usefulness of LLMs for me

	▲	thedevilslawyer an hour ago \| parent \| next [-]
		Being in the cutting edge isn't for everyone. If you can find an island where staying updated is optional you can choose that. Imo, these islands are fast shrinking.
	▲	TeMPOraL 6 hours ago \| parent \| prev [-]
		I get this. On the one hand, those things I wrote down are just simple conclusions from immediate experience, not something I had to learn or feel burdened by - but on the other hand, when I look at similar lists for e.g. how to effectively use Claude Code, I recoil in horror. There's a silver lining in this, though: none of that is any kind of deep expertise, so there's no need for up-front investment. Just start using a tool and pay attention, and you'll pick up on those things in no time.

▲

andrepd 13 hours ago | parent | prev [-]

I've heard my grandma talk about Catholic saints and their powers with a not dissimilar kind of discourse.

▲

TeMPOraL 13 hours ago | parent [-]

Point being?

Unlike Catholic saints, ChatGPT models actually exhibit these properties in directly observable and measurable way. I wrote how I decide which model to use for actual tasks, not which saint to pray to.

▲

andrepd 11 hours ago | parent [-]

My grandma also uses saints for actual tasks (e.g. St Anthony for finding lost items), and they exibith those properties in observable ways (e.g. he found her sewing needles just last month). Perhaps the comparison is more appropriate than you realise.

> actually exhibit these properties in directly observable and measurable way

Well but do they? I don't mean your vibes, and I also don't mean cooked-up benchmarks. For example: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

▲

thedevilslawyer an hour ago | parent | next [-]

If not users opinions, or objective benchmarks, then what? Sounds like you prefer closing your ears saying 'nananana...'

▲

TeMPOraL 6 hours ago | parent | prev [-]

> Perhaps the comparison is more appropriate than you realise.

Or perhaps you stop being obtuse. There's no causal connection between "using saints for actual tasks" and the outcomes, which is why we call this religion. In contrast, you can see the cause-and-effect relationship directly and immediately with LLMs - all it takes is going to chatgpt.com or claude.ai, typing in a query, and observing the result.

> Well but do they? I don't mean your vibes, and I also don't mean cooked-up benchmarks.

Do read the study itself, specifically the parts where the authors spell out specifically what is or isn't being measured here.

	▲	andrepd 3 hours ago \| parent [-]
		It's really simple x) either the "observation" is just vibes, and then it's fundamentally the same as when Gran's knees get better after she asks Saint Euphemia, or it's actually a scientific observation, in which case please post! :) You may not like but it's what it is.