Remix clone Hacker News

new | show | ask | jobs Github

	▲	mapontosevenths 4 hours ago
		There's a guy on Youtube named Bijan Bowen who tests all the models (open and frontier) on a series of one/few shot programming exercises and has been for a long while now. You can pretty much watch him compare the results for any two models you're likely to be interested in. I'm not affiliated, I just like his style and have found it handy. I know it's not very rigorous, but it's good enough for me and I've found his examples to pretty closely match the results I see in real life.
	▲	lambda 4 hours ago \| parent [-]
		OK, it looks like he did a browser OS test with both Claude 4 Opus and Qwen 3.6 35B-A3B. Claude 4 Opus: https://youtu.be/J7omabtqnBM?t=193 Qwen 3.6 35B A3B: https://youtu.be/gVU-DQeqkI0?t=215 Qwen 3.6 produced far more working functionality than Claude 4 Opus did. Obviously, just one test of a single one-shot prompt of a silly toy OS, but yeah, this particular test shows Qwen 3.6 running locally dramatically outperforming Claude 4 Opus, which was a frontier model a year ago.