I don’t think it will ever make sense; you can buy so much cloud based usage for this type of price.

From my perspective, the biggest problem is that I am just not going to be using it 24/7. Which means I’m not getting nearly as much value out of it as the cloud based vendors do from their hardware.

Last but not least, if I want to run queries against open source models, I prefer to use a provider like Groq or Cerebras as it’s extremely convenient to have the query results nearly instantly.

▲

websiteapi a day ago | parent | next [-]

my issue is once you have it in your workflow I'd be pretty latency sensitive. imagine those record-it-all apps working well. eventually you'd become pretty reliant on it. I don't want to necessarily be at the whims of the cloud

	▲	stingraycharles a day ago \| parent [-]
		Aren’t those “record it all” applications implemented as a RAG and injected into the context based on embedding similarity? Obviously you’re not going to always inject everything into the context window.

▲

a day ago | parent | prev | next [-]

[deleted]

▲

lordswork a day ago | parent | prev | next [-]

As long as you're willing to wait up to an hour for your GPU to get scheduled when you do want to use it.

▲

stingraycharles a day ago | parent [-]

I don’t understand what you’re saying. What’s preventing you from using eg OpenRouter to run a query against Kimi-K2 from whatever provider?

▲

hu3 a day ago | parent | next [-]

and you'll get a faster model this way

▲

bgwalter a day ago | parent | prev [-]

Because you have Cloudflare (MITM 1), Openrouter (MITM 2) and finally the "AI" provider who can all read, store, analyze and resell your queries.

EDIT: Thanks for downvoting what is literally one of the most important reasons for people to use local models. Denying and censoring reality does not prevent the bubble from bursting.

	▲	irthomasthomas a day ago \| parent [-]
		you can use chutes.ai TEE (Trusted Execution Environment) and Kimi K2 is running at about 100t/s rn

▲

givinguflac a day ago | parent | prev [-]

I think you’re missing the whole point, which is not using cloud compute.

▲

stingraycharles a day ago | parent [-]

Because of privacy reasons? Yeah I’m not going to spend a small fortune for that to be able to use these types of models.

▲

givinguflac a day ago | parent [-]

There are plenty of examples and reasons to do so besides privacy- because one can, because it’s cool, for research, for fine tuning, etc. I never mentioned privacy. Your use case is not everyone’s.

▲

wyre a day ago | parent [-]

All of those things you can still do renting AI server compute though? I think privacy and cool-factor are the only real reasons why it would be rational for someone to spend checks the apple store $19,000 on computer hardware...

	▲	givinguflac 8 hours ago \| parent [-]
		Why do you look at this as a consumer? Have you never heard of businesses spending money on hardware???