Remix clone Hacker News

new | show | ask | jobs Github

	▲	mobelkh 2 hours ago
		it is falling if you look elsewhere, deepseek made their 75% discount on their V4 models permanent, on one hand there's LLM improvements that make inference cheaper (e.i. MoE, hybrid attention), on the other hand we're getting more inference focused chips that break the nvidia monopoly. i don't think a lot of people know this, but a cluster of GPUs can serve multiple clients without much of a drop in performance, e.i. worst case scenario you band together with 6-16 people to run a 2-3 H100 server to host deepseek V4 Flash or 4-6 to run Pro, and you're getting the same performance as if you ran it alone, this means a lot of companies can afford throwing 50-100k into their own LLM server cluster. We're at a price point where if you push it further people will move, there's no real vendor lock in, your agent config, skills, MCP servers etc are all reusable with other models and harnesses, so unless you get all providers to collude on a price hike, you risk an exodus of customers