Beats opus 4.6! They missed claiming the frontier by a few days.

NitpickLawyer 6 hours ago | parent | next [-]

While I'm skeptical of any "beats opus" claims (many were said, none turned out to be true), I still think it's insane that we can now run close-to-SotA models locally on ~100k worth of hardware, for a small team, and be 100% sure that the data stays local. Should be a no-brainer for teams that work in areas where privacy matters.

▲

cedws 6 hours ago | parent | next [-]

Even the smaller quantized models which can run on consumer hardware pack in an almost unfathomable amount of knowledge. I don't think I expected to be able to run a 'local Google' in my lifetime before the LLM boom.

	▲	sterlind 4 hours ago \| parent [-]
		I'm extremely curious how these models learn to pack a lossily-compressed representation of the entire Internet (more or less) into a few hundred billion parameters. like, what's the ontology?

▲

osti 6 hours ago | parent | prev [-]

I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.

▲

NitpickLawyer 6 hours ago | parent | next [-]

Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.

The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.

	▲	osti 3 hours ago \| parent [-]
		True, but I think for local models, we are mostly considering personal usage.

▲

zozbot234 5 hours ago | parent | prev [-]

You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.

	▲	osti 3 hours ago \| parent [-]
		Yeah... I would definitely call 2t/s unusable. For simple chats, I'd want at least 15 t/s. For agentic coding (which this model is advertised for), I'd want good prefill performance as well.

▲

pixel_popping 6 hours ago | parent | prev | next [-]

It doesn't beat Opus 4.6, no way, don't be fooled by benchmarks.

▲

BoorishBears 6 hours ago | parent | prev [-]

Opus is clearly a sidegrade meant to help Anthropic manage cost, so I would say they may have it if it actually beats 4.6

	▲	irthomasthomas 6 hours ago \| parent [-]
		Could be right. I just noticed my feed is absent the usual flood of posts demoing the new hotness on 3D modeling, game design and SVG drawings of animals on vehicles.