| ▲ | Greed 3 hours ago | |
But why? Spending several thousand dollars to run sub-par models when the break-even point could still be years away seems bizarre for any real usecase where your goal is productivity over novelty. Anyone who has used Codex or Opus can attest that the difference between those and a locally available model like Qwen or Codestral is night and day. To be clear, I totally get the idea of running local LLMs for toy reasons. But in a business context the sell on a stack of Mac Pros seems misguided at best. | ||
| ▲ | 0x457 2 hours ago | parent | next [-] | |
I started doing it to hedge myself for inevitable disappearance of cheap inference. | ||
| ▲ | robotresearcher 2 hours ago | parent | prev | next [-] | |
Sometimes you can't push your working data to third party service, by law, by contract, or by preference. | ||
| ▲ | nurettin an hour ago | parent | prev [-] | |
I ran the qwen 3.5 35b a3b q4 model locally on a ryzen server with 64k context window and 5-8 tokens a second. It is the first local model I've tried which could reason properly. Similar to Gemini 2.5 or sonnet 3.5. I gave it some tools to call , asked claude to order it around, (download quotes, print charts, set up a gnome extension) even claude was sort of impressed that it could get the job done. Point is, it is really close. It isn't opus 4.5 yet, but very promising given the size. Local is definitely getting there and even without GPUs. But you're right, I see no reason to spend right now. | ||