Yep this is a glimpse into the future of 500+ t/s, which is in my opinion the next big thing that validates Jevon's paradox (the models are already smart enough)

▲

digitaltrees 3 hours ago | parent | next [-]

I think the glimpse that is there will be exclusive access. So much for the open in openAI. If this technology really transforms society in the ways expected with inequality an unavoidable consequence equal access should be required like internet access was (isp can’t give preference to specific user traffic)

	▲	inquirerGeneral 2 hours ago \| parent [-]
		[dead]

▲

paxys 3 hours ago | parent | prev | next [-]

Faster tokens = more reasoning loops, so it can actually make the models smarter as well.

	▲	girvo 19 minutes ago \| parent [-]
		Yeah! So at a much smaller scale, being able to boost Step 3.7 Flash up to 40tk/s on my Spark-alike with proper triple head MTP was the thing that made it superior to Qwen 3.6 27B in wall clock time despite Step reasoning more A lot of the open Chinese models get their results through huge reasoning loops. Being able to boost decode perf is what will make them worth it, and I’m sure OpenAI and Anthropic could do similar (if they aren’t already)

▲

devmor 3 hours ago | parent | prev | next [-]

“Smart enough” really depends on how many other people have encountered a problem close enough to yours and solved it somewhere on the open internet, IMO.

Most of the frontier models can, when prompted and tooled correctly, do a lot of “reasoning” tasks that amount to resolving how the user has explained a particular widely known paradigm.

The more difficult and obscure the issues you provide them with, the faster you notice them reward hacking by altering the criteria until they are no longer attempting to solve the problem. Using “advisor” style loops helps hold this off at the cost of tokens, but there is still a fairly short limit at which they will essentially give up if they can’t find all of the necessary information - sometimes the issue is actually worse if they find a small amount of information instead of nothing - they’ll extrapolate from that tiny piece of data and generate plausible-sounding hallucinations almost every time.

And god forbid your problem involves doing something a different way than the majority of people do it. Unless you can write a full spec on it, the models will repeatedly spiral back into adjusting everything about your problem until it matches one of the most popular approaches in their training data.

▲

vb-8448 3 hours ago | parent | next [-]

> how many other people have encountered a problem close enough to yours and solved it somewhere on the open internet

I'm 100% sure that all our web, cc, codex or whatsoever sessions are used in the training, RL or either both.

This makes the size of the universe models know about at least one order of magnitude bigger than the open internet.

▲

beepbooptheory 3 hours ago | parent | next [-]

I get how this is a trueism now but I never really understood why it would be useful to scrape cc/codex sessions for training. The relative amount of human input for that is so low (isn't that why they are so loved and used?), how could it actually be useful to them? Wouldn't you wanna focus on people not using it?

	▲	swiftcoder 2 hours ago \| parent \| next [-]
		It's more useful as a set of feedback on the model results. You can do sentiment analysis on the user responses to see if they found the model results useful/frustrating/etc and use that to guide future training
	▲	vb-8448 2 hours ago \| parent \| prev [-]
		Because you provide them with the "problem" and the "solution" and once you have both you can scale your RL pipeline.

▲

nathan_compton 3 hours ago | parent | prev [-]

I think this is a rosy estimate. The vast majority of what people do with these models is just the same old shit, I would be surprised if 1% of it were genuinely novel stuff worth folding back into the training data.

	▲	vb-8448 2 hours ago \| parent [-]
		Even if "is just the same old shit" they have much more data and of a much higher quality to scale the RL pipeline.

▲

smokel 2 hours ago | parent | prev [-]

This may have been the case one year ago, but with contemporary models such as Opus, I run into this less often.

▲

Gingersnap123 3 hours ago | parent | prev [-]

[dead]