| ▲ | 2ndorderthought 9 hours ago |
| In the past month local models have been ramping up in major way meanwhile the namesake providers have upped prices, went offline randomly, and started doing slimier and slimier things. I really think the future is local compute. Or at least self hosted models. |
|
| ▲ | SchemaLoad 9 hours ago | parent | next [-] |
| The hosted ones still have the advantage of being able to search the internet for live info rather than being limited to a knowledge cut off date. |
| |
| ▲ | gbear605 9 hours ago | parent | next [-] | | I’m not sure why a model needs to be hosted in order to make network calls? | | |
| ▲ | hansvm 9 hours ago | parent [-] | | Is there a library of good tools for LLMs to call? I have to imagine the bot-detection avoidance mechanisms are a major engineering effort and not likely to work out of the box with a simple harness and random local LLM. | | |
| ▲ | gbear605 5 hours ago | parent | next [-] | | If your volume is low enough, it should be pretty fine. It can just piggy back onto your personal browser cookies for Cloudflare. | |
| ▲ | ossa-ma 9 hours ago | parent | prev | next [-] | | Even the hosted ones are blocked from searching certain sites, for example Claude is banned from searching Reddit: `Error: "The following domains are not accessible to our user agent: ['reddit.com']."` | |
| ▲ | wyre 9 hours ago | parent | prev [-] | | Tavily, Exa, Firecrawl, Perplexity, and Linkup are all tools for agents to search the web. I’ve been building a harness the past few months and supports them all out of the box with an API key. | | |
| ▲ | goosejuice 8 hours ago | parent [-] | | Kagi also has an API. People who hate ads are probably the same folk that should be paying for Kagi. That's the sane alternative world where companies respect their users. | | |
| ▲ | wyre 2 hours ago | parent [-] | | Oh, you got me so excited. I've had a Kagi sub for 3 years, but their API is still in closed beta. I guess I could (and should reach out and ask for access). |
|
|
|
| |
| ▲ | chrisweekly 7 hours ago | parent | prev | next [-] | | That's not how it works. Whether local or hosted, every modern model has a cutoff date for its training data, and can be leveraged by agents / harnesses / tools to fetch context from the internet or wherever. | |
| ▲ | darepublic 9 hours ago | parent | prev | next [-] | | Local ones that support tool use can do the same | |
| ▲ | eightysixfour 9 hours ago | parent | prev [-] | | You can do that locally too! |
|
|
| ▲ | CSMastermind 9 hours ago | parent | prev [-] |
| What's the rough equivalent of a local model? Are we talking GPT-4? |
| |
| ▲ | 2ndorderthought 8 hours ago | parent | next [-] | | Qwen 3.6 which was released this month is a large but still smaller model. Supposedly it's at about sonnet level when configured correctly. It can be run on commodity hardware without purchasing a data center.
https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_... Then there are middle size ones which require multiple gpus which are like gpts latest flagships. Then there is kimi 2.6 which is a monster that is beating opus in some benchmarks. https://www.reddit.com/r/LocalLLaMA/comments/1sr8p49/kimi_k2... It's basically whatever you can afford. Any trash heap laptop can run code auto complete models locally no problem. The rest require some level of investment, an idle gaming pc, or a serious investment | |
| ▲ | Terretta 9 hours ago | parent | prev | next [-] | | Depends on your VRAM or "unified" memory for how smart it is, and CPU/GPU for how quick it is. 128GB of RAM? Sure, the early to mid 4s releases, except maybe 4o. And on an M5 Max, about the same speed. I wouldn't really bother under 64GB (meaning 32GB or less) except for entertainment value (chats, summaries, tasky read-only agent things). | |
| ▲ | kay_o 9 hours ago | parent | prev [-] | | GLM 5.1 and DeepSeek 4 are acceptable, but the cost of hardware and energy cost that depending on your use case you may as well purchase a Tokens. They get useless and stupid rapidilty if you quant enough to run on single 16-24GB GPU style. |
|