Remix.run Logo
wrsh07 a day ago

My understanding for the original OpenAI and anthropic labels was essentially: gpt2 was 100x more compute than gpt1. Same for 2 to 3. Same for 3 to 4. Thus, gpt 4.5 was 10x more compute^

If anthropic is doing the same thing, then 3.5 would be 10x more compute vs 3. 3.7 might be 3x more than 3.5. and 4 might be another ~3x.

^ I think this maybe involves words like "effective compute", so yeah it might not be a full pretrain but it might be! If you used 10x more compute that could mean doubling the amount used on pretraining and then using 8x compute on post or some other distribution

swyx a day ago | parent [-]

beyond 4 thats no longer true - marketing took over from the research

wrsh07 a day ago | parent [-]

Oh shoot I thought that still applied to 4.5 just in a more "effective compute" way (not 100x more parameters, but 100x more compute in training)

But alas, it's not like 3nm fab means the literal thing either. Marketing always dominates (and not necessarily in a way that adds clarity)