| ▲ | aeve890 7 hours ago | |||||||
Please correct me if I'm wrong, I'm totally out of my field here but what's the point of sota models that can be run only by hyperscalers? I mean, glm-5.2 is open source but with 1.5TB in weights who can run it really? It still needs dozens of H100s. Those 753B quantized down to Q4 (~400Gb) would require datacenter levels of hardware. Down to Q2 still would require serious hardware, way out of reach for most users, and you'll be far from the sota benchmark of the full precision model. I get it, it's open source but not quite democratizing LLM for everyone except compute providers. It's no like, let's say, Kubernetes. I can run k8s fully in my shitty homelab, without "quantization" exactly like Google does in their datacenters. | ||||||||
| ▲ | danny_codes 4 hours ago | parent | next [-] | |||||||
SOTA models can be run by anybody with compute capacity. You can pay for GLM 5.2 inference right now via Fireworks AI and presumably several dozen other providers. So if you don't want vendor lock-in and rug-pulling (Anthropic has churned on their subscription model like 4-5 times in the past month) you can just pay an inference provider and have far more control over your environment. | ||||||||
| ▲ | skulk 6 hours ago | parent | prev [-] | |||||||
If you have a ton of capital, you still can't spin up Claude Opus and compete on price with Anthropic with your new fancy optimizations. With open models you can and that is great for consumers. | ||||||||
| ||||||||