> Wonder what am I doing wrong?

You're comparing 100b parameters open models running on a consumer laptop VS private models with at the very least 1t parameters running on racks of bleeding edge professional gpus

Local agentic coding is closer to "shit me the boiler plate for an android app" not "deep research questions", especially on your machine

▲

vlovich123 4 hours ago | parent | next [-]

The hardware difference explains runtime performance differences, not task performance.

Speculation is that the frontier models are all below 200B parameters but a 2x size difference wouldn’t fully explain task performance differences

▲

nl 12 minutes ago | parent | next [-]

> Speculation is that the frontier models are all below 200B parameters

Some versions of some the models are around that size, which you might hit for example with the ChatGPT auto-router.

But the frontier models are all over 1T parameters. Source: watch interview with people who have left one of the big three labs and now work at the Chinese labs and are talking about how to train 1T+ models.

▲

NamlchakKhandro an hour ago | parent | prev | next [-]

> The hardware difference explains runtime performance differences, not task performance.

Yes it does.

▲

ses1984 4 hours ago | parent | prev [-]

Who would have thought ai labs with billions upon billions of r&d budget would have better models than a free alternative.

	▲	3 hours ago \| parent [-]
		[deleted]

▲

delaminator 4 hours ago | parent | prev [-]

Looks at the headline: Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

▲

lm28469 4 hours ago | parent [-]

Yes and Devstral 2 24b q4 is supposed to be 90% as good but it can't even reliably write to a file on my machine.

There are the benchmarks, the promises, and what everybody can try at home

	▲	8note 3 hours ago \| parent [-]
		maybe a harness problem?