▲	john01dav 2 hours ago
		Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.
	▲	soleveloper 2 hours ago \| parent \| next [-]
		If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model. Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially. Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits. Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases. For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice. No one got fired from licensing claude code.
	▲	Jianghong94 2 hours ago \| parent \| prev \| next [-]
		I just went through a similar discussion in my $WORK (traditional finance company on NYSE with average IT expertise) and I think the thought process is as such: it's one thing to just give your stellar dev/hacker a beefy GPU server and run whatever model they can run; it's another thing to maintain such platform for company wide. You would need human resource (likely way above normal software dev paygrade) to understand and maintain such models, maintain backend, availability etc. All these extra hassle make it just easier to pay a top tier external lab + slap a reasonable spending limit on everybody.
	▲	esikich 2 hours ago \| parent \| prev \| next [-]
		Why do you think it would be more common? The pooling of GPUs to serve multiple users and connecting to docs/datalakes while respecting security controls, as a start, is non-trivial. You'd end up paying a team to manage that.
	▲	fg137 an hour ago \| parent \| prev \| next [-]
		For the same reasons companies are not building data centers for their "regular" hosting and storage needs but put things on AWS, Azure etc. It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.
	▲	fg137 an hour ago \| parent \| prev \| next [-]
		> I've tried the open weight models ... You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
	▲	datsci_est_2015 2 hours ago \| parent \| prev \| next [-]
		There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.
	▲	malfist 2 hours ago \| parent \| prev \| next [-]
		Where are you buying the GPUs to have enough compute to run a medium size buisness?
	▲	throwaway613746 2 hours ago \| parent \| prev [-]
		[dead]