Remix.run Logo
2ndorderthought 9 hours ago

In the past month local models have been ramping up in major way meanwhile the namesake providers have upped prices, went offline randomly, and started doing slimier and slimier things.

I really think the future is local compute. Or at least self hosted models.

SchemaLoad 9 hours ago | parent | next [-]

The hosted ones still have the advantage of being able to search the internet for live info rather than being limited to a knowledge cut off date.

gbear605 9 hours ago | parent | next [-]

I’m not sure why a model needs to be hosted in order to make network calls?

hansvm 9 hours ago | parent [-]

Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM.

gbear605 5 hours ago | parent | next [-]

If your volume is low enough, it should be pretty fine. It can just piggy back onto your personal browser cookies for Cloudflare.

ossa-ma 9 hours ago | parent | prev | next [-]

Even the hosted ones are blocked from searching certain sites, for example Claude is banned from searching Reddit:

`Error: "The following domains are not accessible to our user agent: ['reddit.com']."`

wyre 9 hours ago | parent | prev [-]

Tavily, Exa, Firecrawl, Perplexity, and Linkup are all tools for agents to search the web.

I’ve been building a harness the past few months and supports them all out of the box with an API key.

goosejuice 8 hours ago | parent [-]

Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users.

wyre 2 hours ago | parent [-]

Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access).

chrisweekly 7 hours ago | parent | prev | next [-]

That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever.

darepublic 9 hours ago | parent | prev | next [-]

Local ones that support tool use can do the same

eightysixfour 9 hours ago | parent | prev [-]

You can do that locally too!

CSMastermind 9 hours ago | parent | prev [-]

What's the rough equivalent of a local model? Are we talking GPT-4?

2ndorderthought 8 hours ago | parent | next [-]

Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center. https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_...

Then there are middle size ones which require multiple gpus which are like gpts latest flagships.

Then there is kimi 2.6 which is a monster that is beating opus in some benchmarks. https://www.reddit.com/r/LocalLLaMA/comments/1sr8p49/kimi_k2...

It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment

Terretta 9 hours ago | parent | prev | next [-]

Depends on your VRAM or "unified" memory for how smart it is, and CPU/GPU for how quick it is.

128GB of RAM? Sure, the early to mid 4s releases, except maybe 4o. And on an M5 Max, about the same speed.

I wouldn't really bother under 64GB (meaning 32GB or less) except for entertainment value (chats, summaries, tasky read-only agent things).

kay_o 9 hours ago | parent | prev [-]

GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style.