▲ | Philpax 6 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> The number system differentiates the models based on capability, any other method would not do that. Please rank GPT-4, GPT-4 Turbo, GPT-4o, GPT-4.1-nano, GPT-4.1-mini, GPT-4.1, GPT-4.5, o1-mini, o1, o1 pro, o3-mini, o3-mini-high, o3, and o4-mini in terms of capability without consulting any documentation. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | umanwizard 6 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Btw, as someone who agrees with your point, what’s the actual answer to this? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | zeroxfe 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There's no single ordering -- it really depends on what you're trying to do, how long you're willing to wait, and what kinds of modalities you're interested in. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | newfocogi 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I recognize this is a somewhat rhetorical question and your point is well taken. But something that maps well is car makes and models: - Is Ford Better than Chevy? (Comparison across providers) It depends on what you value, but I guarantee there's tribes that are sure there's only one answer. - Is the 6th gen 2025 4Runner better than 5th gen 2024 4Runner? (Comparison of same model across new releases) It depends on what you value. It is a clear iteration on the technology, but there will probably be more plastic parts that will annoy you as well. - Is the 2025 BMW M3 base model better than the 2022 M3 Competition (Comparing across years and trims)? Starts to depend even more on what you value. Providers need to delineate between releases, and years, models, and trims help do this. There are companies that will try to eschew this and go the Tesla route without models years, but still can't get away from it entirely. To a certain person, every character in "2025 M3 Competition xDrive Sedan" matters immensely, to another person its just gibberish. But a pure ranking isn't the point. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | mrandish 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes, point taken. However, it's still not as bad as Intel CPU naming in some generations or USB naming (until very recently). I know, that's a very low bar... :-) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | codingwagie 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Very easy with the naming system? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | chaos_emergent 6 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I meant this is actually straight-forward if you've been paying even the remotest of attention. Chronologically: GPT-4, GPT-4 Turbo, GPT-4o, o1-preview/o1-mini, o1/o3-mini/o3-mini-high/o1-pro, gpt-4.5, gpt-4.1 Model iterations, by training paradigm: SGD pretraining with RLHF: GPT-4 -> turbo -> 4o SGD pretraining w/ RL on verifiable tasks to improve reasoning ability: o1-preview/o1-mini -> o1/o3-mini/o3-mini-high (technically the same product with a higher reasoning token budget) -> o3/o4-mini (not yet released) reasoning model with some sort of Monte Carlo Search algorithm on top of reasoning traces: o1-pro Some sort of training pipeline that does well with sparser data, but doesn't incorporate reasoning (I'm positing here, training and architecture paradigms are not that clear for this generation): gpt-4.5, gpt-4.1 (likely fine-tuned on 4.5) By performance: hard to tell! Depends on what your task is, just like with humans. There are plenty of benchmarks. Roughly, for me, the top 3 by task are: Creative Writing: gpt-4.5 -> gpt-4o Business Comms: o1-pro -> o1 -> o3-mini Coding: o1-pro -> o3-mini (high) -> o1 -> o3-mini (low) -> o1-mini-preview Shooting the shit: gpt-4o -> o1 It's not to dismiss that their marketing nomenclature is bad, just to point out that it's not that confusing for people that are actively working with these models have are a reasonable memory of the past two years. |