This is gross

It feels like we’ve been in the golden age and the window is coming to a close

Let the enshitification begin, I guess

How do you expect the spend & COGS for free LLM inference to be funded? For users who don't want to pay, or maybe can't pay?

▲

derektank 8 hours ago | parent | next [-]

Perhaps it’s a glib and easy thing to say, but after a teaser period, I would simply not offer free LLM inference. Agreeing to serve ads just completely re-aligns your interests away from providing the best possible user experience to something else entirely.

▲

infinite_spin 9 hours ago | parent | prev [-]

From things like defense/private contracts

e.g. colleges pay for institutional subscriptions

▲

2ndorderthought 8 hours ago | parent [-]

The average person doesn't benefit from defense contracts ... Like ever.

▲

IX-103 8 hours ago | parent [-]

The average person is slightly more female than male and has 2.1 children, but they do benefit from defense contracts since it makes up a small percentage of their salary.

	▲	2ndorderthought 8 hours ago \| parent [-]
		You are a fun person. We should be friends

▲

iammrpayments 8 hours ago | parent | prev | next [-]

It has begun ever since they nerfed chatgpt4 before releasing 4o

▲

2ndorderthought 9 hours ago | parent | prev | next [-]

In the past month local models have been ramping up in major way meanwhile the namesake providers have upped prices, went offline randomly, and started doing slimier and slimier things.

I really think the future is local compute. Or at least self hosted models.

▲

SchemaLoad 9 hours ago | parent | next [-]

The hosted ones still have the advantage of being able to search the internet for live info rather than being limited to a knowledge cut off date.

▲

gbear605 9 hours ago | parent | next [-]

I’m not sure why a model needs to be hosted in order to make network calls?

▲

hansvm 9 hours ago | parent [-]

Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM.

▲

gbear605 5 hours ago | parent | next [-]

If your volume is low enough, it should be pretty fine. It can just piggy back onto your personal browser cookies for Cloudflare.

▲

ossa-ma 9 hours ago | parent | prev | next [-]

Even the hosted ones are blocked from searching certain sites, for example Claude is banned from searching Reddit:

`Error: "The following domains are not accessible to our user agent: ['reddit.com']."`

▲

wyre 9 hours ago | parent | prev [-]

Tavily, Exa, Firecrawl, Perplexity, and Linkup are all tools for agents to search the web.

I’ve been building a harness the past few months and supports them all out of the box with an API key.

▲

goosejuice 8 hours ago | parent [-]

Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users.

	▲	wyre 2 hours ago \| parent [-]
		Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access).

▲

chrisweekly 7 hours ago | parent | prev | next [-]

That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.

▲

darepublic 9 hours ago | parent | prev | next [-]

Local ones that support tool use can do the same

▲

eightysixfour 9 hours ago | parent | prev [-]

You can do that locally too!

▲

CSMastermind 9 hours ago | parent | prev [-]

What's the rough equivalent of a local model? Are we talking GPT-4?

	▲	2ndorderthought 8 hours ago \| parent \| next [-]
		Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center. https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_... Then there are middle size ones which require multiple gpus which are like gpts latest flagships. Then there is kimi 2.6 which is a monster that is beating opus in some benchmarks. https://www.reddit.com/r/LocalLLaMA/comments/1sr8p49/kimi_k2... It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment
	▲	Terretta 9 hours ago \| parent \| prev \| next [-]
		Depends on your VRAM or "unified" memory for how smart it is, and CPU/GPU for how quick it is. 128GB of RAM? Sure, the early to mid 4s releases, except maybe 4o. And on an M5 Max, about the same speed. I wouldn't really bother under 64GB (meaning 32GB or less) except for entertainment value (chats, summaries, tasky read-only agent things).
	▲	kay_o 9 hours ago \| parent \| prev [-]
		GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style.

▲

rnxrx 9 hours ago | parent | prev [-]

The arc of the technological universe is short, but it bends toward enshitification.