The hardware difference explains runtime performance differences, not task performance.

Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn’t fully explain task performance differences

▲

nl 11 minutes ago | parent | next [-]

> Speculation is that the frontier models are all below 200B parameters

Some versions of some the models are around that size, which you might hit for example with the ChatGPT auto-router.

But the frontier models are all over 1T parameters. Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models.

▲

NamlchakKhandro an hour ago | parent | prev | next [-]

> The hardware difference explains runtime performance differences, not task performance.

Yes it does.

▲

ses1984 4 hours ago | parent | prev [-]

Who would have thought ai labs with billions upon billions of r&d budget would have better models than a free alternative.

	▲	3 hours ago \| parent [-]
		[deleted]