Remix clone Hacker News

new | show | ask | jobs Github

	▲	wild_egg 6 hours ago
		Where did you see a haiku comparison? Haiku 4.5 was my daily driver for a month or so before Opus 4.5 dropped and would be unreasonably happy if a local model can give me similar capability
	▲	daemonologist 6 hours ago \| parent \| next [-]
		I didn't see a direct comparison, but there's some overlap in the published benchmarks: │ Qwen 3.6 35B-A3B │ Haiku 4.5 ────────────────────────┼──────────────────┼──────────────────────── SWE-Bench Verified │ 73.4 │ 66.6 ────────────────────────┼──────────────────┼──────────────────────── SWE-Bench Multilingual │ 67.2 │ 64.7 ────────────────────────┼──────────────────┼──────────────────────── SWE-Bench Pro │ 49.5 │ 39.45 ────────────────────────┼──────────────────┼──────────────────────── Terminal Bench 2.0 │ 51.5 │ 61.2 (Warp), 27.5 (CC) ────────────────────────┼──────────────────┼──────────────────────── LiveCodeBench │ 80.4 │ 41.92 These are of course all public benchmarks though - I'd expect there to be some memorization/overfitting happening. The proprietary models usually have a bit of an advantage in real-world tasks in my experience.
	▲	coder543 6 hours ago \| parent \| prev \| next [-]
		Artificial Analysis hasn't posted their independent analysis of Qwen3.6 35B A3B yet, but Alibaba's benchmarks paint it as being on par with Qwen3.5 27B (or better in some cases). Even Qwen3.5 35B A3B benchmarks roughly on par with Haiku 4.5, so Qwen3.6 should be a noticeable step up. https://artificialanalysis.ai/models?models=gpt-oss-120b%2Cg... No, these benchmarks are not perfect, but short of trying it yourself, this is the best we've got. Compared to the frontier coding models like Opus 4.7 and GPT 5.4, Qwen3.6 35B A3B is not going to feel smart at all, but for something that can run quickly at home... it is impressive how far this stuff has come.
	▲	deaux 3 hours ago \| parent \| prev [-]
		I find Gemma 4 26B A4B better than Haiku 4.5 and that's smaller than this one.