Remix.run Logo
comex 6 days ago

Sam Altman wrote in February that GPT-4.5 would be "our last non-chain-of-thought model" [1], but GPT-4.1 also does not have internal chain-of-thought [2].

It seems like OpenAI keeps changing its plans. Deprecating GPT-4.5 less than 2 months after introducing it also seems unlikely to be the original plan. Changing plans is necessarily a bad thing, but I wonder why.

Did they not expect this model to turn out as well as it did?

[1] https://x.com/sama/status/1889755723078443244

[2] https://github.com/openai/openai-cookbook/blob/6a47d53c967a0...

observationist 6 days ago | parent | next [-]

Anyone making claims with a horizon beyond two months about structure or capabilities will be wrong - it's sama's job to show confidence and vision and calm stakeholders, but if you're paying attention to the field, the release and research cycles are still contracting, with no sense of slowing any time soon. I've followed AI research daily since GPT-2, the momentum is incredible, and even if the industry sticks with transformers, there are years left of low hanging fruit and incremental improvements before things start slowing.

There doesn't appear to be anything that these AI models cannot do, in principle, given sufficient data and compute. They've figured out multimodality and complex integration, self play for arbitrary domains, and lots of high-cost longer term paradigms that will push capabilities forwards for at least 2 decades in conjunction with Moore's law.

Things are going to continue getting better, faster, and weirder. If someone is making confident predictions beyond those claims, it's probably their job.

sottol 6 days ago | parent | next [-]

Maybe that's true for absolute arm-chair-engineering outsiders (like me) but these models are in training for months, training data is probably being prepared year(s) in advance. These models have a knowledge cut-off in 2024 - so they have been in training for a while. There's no way sama did not have a good idea that this non-COT was in the pipeline 2 months ago. It was probably finished training then and undergoing evals.

Maybe

1. he's just doing his job and hyping OpenAI's competitive advantages (afair most of the competition didn't have decent COT models in Feb), or

2. something changed and they're releasing models now that they didn't intend to release 2 months ago (maybe because a model they did intend to release is not ready and won't be for a while), or

3. COT is not really as advantageous as it was deemed to be 2+ months ago and/or computationally too expensive.

fragmede 6 days ago | parent [-]

With new hardware from Nvidia announced coming out, those months turn into weeks.

sottol 6 days ago | parent [-]

I doubt it's going to be weeks, the months were already turning into years despite Nvidia's previous advances.

(Not to say that it takes openai years to train a new model, just that the timeline between major GPT releases seems to double... be it for data gathering, training, taking breaks between training generations, ... - either way, model training seems to get harder not easier).

GPT Model | Release Date | Months Passed Between Former Model

GPT-1 | 11.06.2018

GPT-2 | 14.02.2019 | 8.16

GPT-3 | 28.05.2020 | 15.43

GPT-4 | 14.03.2023 | 33.55

[1]https://www.lesswrong.com/posts/BWMKzBunEhMGfpEgo/when-will-...

observationist 6 days ago | parent | next [-]

The capabilities and general utility of the models are increasing on an entirely different trajectory than model names - the information you posted is 99% dependent on internal OAI processes and market activities as opposed to anything to do with AI.

I'm talking more broadly, as well, including consideration of audio, video, and image modalities, general robotics models, and the momentum behind applying some of these architectures to novel domains. Protocols like MCP and automation tooling are rapidly improving, with media production and IT work rapidly being automated wherever possible. When you throw in the chemistry and materials science advances, protein modeling, etc - we have enormously powerful AI with insufficient compute and expertise to apply it to everything we might want to. We have research being done on alternate architectures, and optimization being done on transformers that are rapidly reducing the cost/performance ratio. There are models that you can run on phones that would have been considered AGI 10 years ago, and there doesn't seem to be any fundamental principle decreasing the rate of improvement yet. If alternate architectures like RWKV get funded, there might be several orders of magnitude improvement with relatively little disruption to production model behaviors, but other architectures like text diffusion could obsolete a lot of the ecosystem being built up around LLMs right now.

There are a million little considerations pumping transformer LLMs right now because they work and there's every reason to expect them to continue improving in performance and value for at least a decade. There aren't enough researchers and there's not enough compute to saturate the industry.

fragmede 5 days ago | parent | prev [-]

Fair point, I guess my question is how long it would take them to train GPT-2 on the absolute bleedingest generation of Nvidia chips vs what they had in 2019, with the budget they have to blow on Nvidia supercomputers today.

authorfly 6 days ago | parent | prev | next [-]

the release and research cycles are still contracting

Not necessarily progress or benchmarks that as a broader picture you would look at (MMLU etc)

GPT-3 was an amazing step up from GPT-2, something scientists in the field really thought was 10-15 years out at least done in 2, instruct/RHLF for GPTs was a similar massive splash, making the second half of 2021 equally amazing.

However nothing since has really been that left field or unpredictable from then, and it's been almost 3 years since RHLF hit the field. We knew good image understanding as input, longer context, and improved prompting would improve results. The releases are common, but the progress feels like it has stalled for me.

What really has changed since Davinci-instruct or ChatGPT to you? When making an AI-using product, do you construct it differently? Are agents presently more than APIs talking to databases with private fields?

hectormalot 6 days ago | parent | next [-]

In some dimensions I recognize the slow down in how fast new capabilities develop, but the speed still feels very high:

Image generation suddenly went from gimmick to useful now that prompt adherence is so much better (eagerly waiting for that to be in the API)

Coding performance continues to improve noticeably (for me). Claude 3.7 felt like a big step from 4o/3.5. Gemini 2.5 in a similar way.compared to just 6 months ago I can give bigger and more complex pieces of work to it and get relatively good output back. (Net acceleration)

Audio-2-audio seems like it will be a big step as well. I think this has much more potential than the STT-LLM-TTS architecture commonly used today (latency, quality)

kadushka 6 days ago | parent | prev | next [-]

I see a huge progress made since the first gpt-4 release. The reliability of answers has improved an order of magnitude. Two years ago, more than half of my questions resulted in incorrect or partially correct answers (most of my queries are about complicated software algorithms or phd level research brainstorming). A simple “are you sure” prompt would force the model to admit it was wrong most of the time. Now with o1 this almost never happens and the model seems to be smarter or at least more capable than me - in general. GPT-4 was a bright high school student. o1 is a postdoc.

liamwire 6 days ago | parent | prev [-]

Excuse the pedantry; for those reading, it’s RLHF rather than RHLF.

moojacob 6 days ago | parent | prev [-]

> Things are going to continue getting better, faster, and weirder.

I love this. Especially the weirder part. This tech can be useful in every crevice of society and we still have no idea what new creative use cases there are.

Who would’ve guessed phones and social media would cause mass protests because bystanders could record and distribute videos of the police?

staunton 6 days ago | parent [-]

> Who would’ve guessed phones and social media would cause mass protests because bystanders could record and distribute videos of the police?

That would have been quite far down on my list of "major (unexpected) consequences of phones and social media"...

ewoodrich 6 days ago | parent [-]

Yep, it’s literally just a slightly higher tech version of (for example) the 1992 Los Angeles riots over Rodney King but with phones and Facebook instead of handheld camcorders and television.

wongarsu 6 days ago | parent | prev | next [-]

Maybe that's why they named this model 4.1, despite coming out after 4.5 and supposedly outperforming it. They can pretend GPT-4.5 is the last non-chain-of-thought model by just giving all non-chain-of-thought-models version numbers below 4.5

chrisweekly 6 days ago | parent [-]

Ok, I know naming things is hard, but 4.1 comes out after 4.5? Just, wat.

CamperBob2 6 days ago | parent [-]

For a long time, you could fool models with questions like "Which is greater, 4.10 or 4.5?" Maybe they're still struggling with that at OpenAI.

ben_w 6 days ago | parent [-]

At this point, I'm just assuming most AI models — not just OpenAI's — name themselves. And that they write their own press releases.

Cheer2171 6 days ago | parent | prev | next [-]

Why do you expect to believe a single word Sam Altman says?

sigmoid10 6 days ago | parent [-]

Everyone assumed malice when the board fired him for not always being "candid" - but it seems more and more that he's just clueless. He's definitely capable when it comes to raising money as a business, but I wouldn't count on any tech opinion from him.

zitterbewegung 6 days ago | parent | prev | next [-]

I think that people balked at the cost of 4.5 and really wanted just a slightly more improved 4o . Now it almost seems that they will have a separate products that are non chain of thought and chain of thought series which actually makes sense because some want a cheap model and some don't.

freehorse 6 days ago | parent | prev | next [-]

> Deprecating GPT-4.5 less than 2 months after introducing it also seems unlikely to be the original plan.

Well they actually hinted already of possible depreciation in their initial announcement of gpt4.5 [0]. Also, as others said, this model was already offered in the api as chatgpt-latest, but there was no checkpoint which made it unreliable for actual use.

[0] https://openai.com/index/introducing-gpt-4-5/#:~:text=we%E2%...

resource_waste 6 days ago | parent | prev | next [-]

When I saw them say 'no more non COT models', I was minorly panicked.

While their competitors have made fantastic models, at the time I perceived ChatGPT4 was the best model for many applications. COT was often tricked by my prompts, assuming things to be true, when a non-COT model would say something like 'That isnt necessarily the case'.

I use both COT and non when I have an important problem.

Seeing them keep a non-COT model around is a good idea.

6 days ago | parent | prev | next [-]
[deleted]
adamgordonbell 6 days ago | parent | prev [-]

Perhaps it is a distilled 4.5, or based on it's lineage, as some suggested.