Remix.run Logo
jdoliner 2 hours ago

I've seen a rumor going around that OpenAI hasn't had a successful pre-training run since mid 2024. This seemed insane to me but if you give ChatGPT 5.1 a query about current events and instruct it not to use the internet it will tell you its knowledge cutoff is June 2024. Not sure if maybe that's just the smaller model or what. But I don't think it's a good sign to get that from any frontier model today, that's 18 months ago.

alecco 2 hours ago | parent | next [-]

SemiAnalysis said it last week and AFAIK it wasn't denied.

https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

CamperBob2 34 minutes ago | parent [-]

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

p1necone 2 hours ago | parent | prev | next [-]

Every so often I try out a GPT model for coding again, and manage to get tricked by the very sparse conversation style into thinking it's great for a couple of days (when it says nothing and then finishes producing code with a 'I did x, y and z' with no stupid 'you're absolutely' right sucking up and it works, it feels very good).

But I always realize it's just smoke and mirrors - the actual quality of the code and the failure modes and stuff are just so much worse than claude and gemini.

kshacker 42 minutes ago | parent | next [-]

I am a novice programmer -- I have programmed for 35+ years now but I build and lose the skills moving between coder to manager to sales -- multiple times. Fresh IC since last week again :) I have coded starting with Fortran, RPG and COBOL and I have also coded Java and Scala. I know modern architecture but haven't done enough grunt work to make it work or to debug (and fix) a complex problem. Needless to say sometimes my eyes glaze over the code.

And I write some code for my personal enjoyment, and I gave it to Claude 6-8 months back for improvement, it gave me a massive change log and it was quite risky so abandoned it.

I tried this again with Gemini last week, I was more prepared and asked it to improve class by class, and for whatever reasons I got better answers -- changed code, with explanations, and when I asked it to split the refactor in smaller steps, it did so. Was a joy working on this over the thanksgiving holidays. It could break the changes in small pieces, talk through them as I evolved concepts learned previously, took my feedback and prioritization, and also gave me nuanced explanation of the business objectives I was trying to achieve.

This is not to downplay claude, that is just the sequence of events narration. So while it may or may not work well for experienced programmers, it is such a helpful tool for people who know the domain or the concepts (or both) and struggle with details, since the tool can iron out a lot of details for you.

My goal now is to have another project for winter holidays and then think through 4-6 hour AI assisted refactors over the weekends. Do note that this is a project of personal interest so not spending weekends for the big man.

bovermyer a few seconds ago | parent [-]

I have never considered trying to apply Claude/Gemini/etc. to Fortran or COBOL. That would be interesting.

tartoran an hour ago | parent | prev | next [-]

I'm starting with Claude at work but did have an okay experience with OpenAi so far. For clearly delimited tasks it does produce working code more often than not. I've seen some improvement on their side compared to say, last year. For something more complex and not clearly defined in advance, yes, it does produce plausible garbage and it goes off the rails a lot. I was migrating a project and asked ChatGPT to analyze the original code base and produce a migration plan. The result seemed good and encouraging because I didn't know much about that project at that time. But I ended up taking a different route and when I finished the migration (with bits of help from ChatGPT) I looked at the original migration plan out of curiosity since I had become more familiar with the project by now. And the migration plan was an absolutely useless and senseless hallucination.

wahnfrieden 21 minutes ago | parent [-]

Use Codex for coding work

sharyphil 9 minutes ago | parent | prev | next [-]

You're absolutely right!

Somehow it doesn't get on my nerves (unlike Gemini with "Of course").

herpdyderp an hour ago | parent | prev | next [-]

On the contrary, I cannot use the top Gemini and Claude models because their outputs are so out place and hard to integrate with my code bases. The GPT 5 models integrate with my code base's existing patterns seamlessly.

inquirerGeneral 28 minutes ago | parent [-]

[dead]

jpalomaki an hour ago | parent | prev | next [-]

Can you give some concrete example of programming problem task GPT fails to solve?

Interested, because I’ve been getting pretty good results with different tasks using the Codex.

cmarschner an hour ago | parent [-]

Completely failed for me running the code it changed in a docker container i keep running. Claude did it flawlessly. It absolutely rocks at code reviews but ir‘s terrible in comparison generating code

findjashua 43 minutes ago | parent | prev | next [-]

NME at all - 5.1 codex has been the best by far.

manmal 41 minutes ago | parent [-]

How can you stand the excruciating slowness? Claude Code is running circles around codex. The most mundane tasks make it think for a minute before doing anything.

wahnfrieden 20 minutes ago | parent [-]

By learning to parallelize my work. This also solved my problem with slow Xcode builds.

logicchains 43 minutes ago | parent | prev [-]

I find for difficult questions math and design questions GPT5 tends to produce better answers than Claude and Gemini.

munk-a 20 minutes ago | parent [-]

Could you clarify what you mean by design questions? I do agree that GPT5 tends to have a better agentic dispatch style for math questions but I've found it has really struggled with data model design.

amluto 21 minutes ago | parent | prev | next [-]

I asked ChatGPT 5.1 to help me solve a silly installation issue with the codex command line tool (I’m not an npm user and the recommended installation method is some kludge using npm), and ChatGPT told me, with a straight face, that codex was discontinued and that I must have meant the “openai” command.

Coneylake 17 minutes ago | parent [-]

"with a straight face"

abixb 7 minutes ago | parent | next [-]

Anthropomorphizing non-human things is only human.

empiko 7 minutes ago | parent | prev [-]

:|

nickff 2 hours ago | parent | prev | next [-]

I recall reading that Google had similar 'delay' issues when crawling the web in 2000 and early 2001, but they managed to survive. That said, OpenAI seems much less differentiated (now) than Google was back then, so this may be a much riskier situation.

searls 2 hours ago | parent | prev | next [-]

Funny, had it tell me the same thing twice yesterday and that was _with_ thinking + search enabled on the request (it apparently refused to carry out the search, which it does once in every blue moon).

I didn't make this connection that the training data is that old, but that would indeed augur poorly.

manmal 43 minutes ago | parent | prev | next [-]

That would explain why it’s so bad with new Swift features and more recent ast-grep rules.

mr_00ff00 39 minutes ago | parent | prev | next [-]

What is a pre-training run?

abixb 35 minutes ago | parent | next [-]

The first step in building a large language model. That's when the model is initiated and trained on a huge dataset to learn patterns and whatnot. The "P" in "GPT" stands for "pre-trained."

bckr 34 minutes ago | parent | prev [-]

That’s where they take their big pile of data and train the model to do next-token-prediction.

nextworddev an hour ago | parent | prev | next [-]

Don’t forget SemiAnalysis’s founder Dylan Patel is supposedly roommates with Anthropics RL tech lead Sholto..

nickysielicki an hour ago | parent [-]

The fundamental problem with bubbles like this, is that you get people like this who are able to take advantage of the The Gell-Mann amnesia effect, except the details that they’re wrong about are so niche that there’s a vanishingly small group of people who are qualified to call them out on it, and there’s simultaneously so much more attention on what they say because investors and speculators are so desperate and anxious for new information.

I followed him on Twitter. He said some very interesting things, I thought. Then he started talking about the niche of ML/AI I work near, and he was completely wrong about it. I became enlightened.

simianparrot an hour ago | parent | prev | next [-]

Any data after that is contaminated with vast amount of AI slop. Is anyone training on anything newer..?

throwaway314155 an hour ago | parent | prev [-]

It has no idea what it's own knowledge cutoff is.