In the past month local models have been ramping up in major way meanwhile the namesake providers have upped prices, went offline randomly, and started doing slimier and slimier things.

I really think the future is local compute. Or at least self hosted models.

▲

SchemaLoad 9 hours ago | parent | next [-]

The hosted ones still have the advantage of being able to search the internet for live info rather than being limited to a knowledge cut off date.

▲

gbear605 9 hours ago | parent | next [-]

I’m not sure why a model needs to be hosted in order to make network calls?

▲

hansvm 9 hours ago | parent [-]

Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM.

▲

gbear605 5 hours ago | parent | next [-]

If your volume is low enough, it should be pretty fine. It can just piggy back onto your personal browser cookies for Cloudflare.

▲

ossa-ma 9 hours ago | parent | prev | next [-]

Even the hosted ones are blocked from searching certain sites, for example Claude is banned from searching Reddit:

`Error: "The following domains are not accessible to our user agent: ['reddit.com']."`

▲

wyre 9 hours ago | parent | prev [-]

Tavily, Exa, Firecrawl, Perplexity, and Linkup are all tools for agents to search the web.

I’ve been building a harness the past few months and supports them all out of the box with an API key.

▲

goosejuice 8 hours ago | parent [-]

Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users.

	▲	wyre 2 hours ago \| parent [-]
		Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access).

▲

chrisweekly 7 hours ago | parent | prev | next [-]

That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.

▲

darepublic 9 hours ago | parent | prev | next [-]

Local ones that support tool use can do the same

▲

eightysixfour 9 hours ago | parent | prev [-]

You can do that locally too!

▲

CSMastermind 9 hours ago | parent | prev [-]

What's the rough equivalent of a local model? Are we talking GPT-4?

	▲	2ndorderthought 8 hours ago \| parent \| next [-]
		Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center. https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_... Then there are middle size ones which require multiple gpus which are like gpts latest flagships. Then there is kimi 2.6 which is a monster that is beating opus in some benchmarks. https://www.reddit.com/r/LocalLLaMA/comments/1sr8p49/kimi_k2... It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment
	▲	Terretta 9 hours ago \| parent \| prev \| next [-]
		Depends on your VRAM or "unified" memory for how smart it is, and CPU/GPU for how quick it is. 128GB of RAM? Sure, the early to mid 4s releases, except maybe 4o. And on an M5 Max, about the same speed. I wouldn't really bother under 64GB (meaning 32GB or less) except for entertainment value (chats, summaries, tasky read-only agent things).
	▲	kay_o 9 hours ago \| parent \| prev [-]
		GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style.