| |
| ▲ | jonaustin 18 minutes ago | parent | next [-] | | And shout-out to Qwen if they release 122b -- Jeff Barr's original Gemma 4 tweet said they'd release a ~122b, then it got redacted :( | |
| ▲ | bertili 8 hours ago | parent | prev | next [-] | | Is there any source for these claims? | | | |
| ▲ | stingraycharles 8 hours ago | parent | prev | next [-] | | 397A17B = 397B total weights, 17B per expert? | | |
| ▲ | zackangelo 8 hours ago | parent | next [-] | | 17b per token. So when you’re generating a single stream of text (“decoding”) 17b parameters are active. If you’re decoding multiple streams, it will be 17b per stream (some tokens will use the same expert, so there is some overlap). When the model is ingesting the prompt (“prefilling”) it’s looking at many tokens at once, so the number of active parameters will be larger. | |
| ▲ | wongarsu 8 hours ago | parent | prev | next [-] | | 397B params, 17B activated at the same time Those 17B might be split among multiple experts that are activated simultaneously | |
| ▲ | littlestymaar 8 hours ago | parent | prev [-] | | That's not how it works. Many people get confused by the “expert” naming, when in reality the key part of the original name “sparse mixture of experts” is sparse. Experts are just chunks of each layers MLP that are only partially activated by each token, there are thousands of “experts” in such a model (for Qwen3-30BA3, it was 48 layers x 128 “experts” per layer with only 8 active at each token) |
| |
| ▲ | kylehotchkiss 6 hours ago | parent | prev [-] | | How many people/hackernews can run a 397b param model at home? Probably like 20-30. | | |
| ▲ | jubilanti 5 hours ago | parent | next [-] | | You can rent a cloud H200 with 140GB VRAM in a server with 256GB system ram for $3-4/hr. | |
| ▲ | adrian_b an hour ago | parent | prev | next [-] | | The 397B model can be run at home with the weights stored on an SSD (or on 2 SSDs, for double throughput). Probably too slow for chat, but usable as a coding assistant. | | |
| ▲ | xienze an hour ago | parent [-] | | I think you have that backwards. Agentic coding is way more demanding than simple chat. The request/response loops (tool calling) are much tighter and more numerous, and the context is waaaaay bigger in general. |
| |
| ▲ | r-w 6 hours ago | parent | prev | next [-] | | OpenRouter. | | |
| ▲ | mistercheese 5 hours ago | parent | next [-] | | Yeah I think there’s benefits to third-party providers being able to run the large models and have stronger guarantees about ZDR and knowing where they are hosted! So Open Weights for even the large models we can’t personally serve on our laptops is still useful. | |
| ▲ | parsimo2010 5 hours ago | parent | prev [-] | | If you're running it from OpenRouter, you might as well use Qwen3.6 Plus. You don't need to be picky about a particular model size of 3.6. If you just want the 397b version to save money, just pick a cheaper model like M2.7. |
| |
| ▲ | ydj 3 hours ago | parent | prev | next [-] | | Running the mxfp4 unsloth quant of qwen3.5-397b-a17b, I get 40 tps prefill, 20tps decode. AMD threadripper pro 9965WX, 256gb ddr5 5600, rtx 4090. | |
| ▲ | bitbckt 4 hours ago | parent | prev | next [-] | | I'm running it on dual DGX Sparks. | |
| ▲ | stavros 5 hours ago | parent | prev | next [-] | | It doesn't matter how many can run it now, it's about freedom. Having a large open weights model available allows you to do things you can't do with closed models. | |
| ▲ | kridsdale3 5 hours ago | parent | prev [-] | | I can (barely, but sustainably) run Q3.5 397B on my Mac Studio with 256GB unified. It cost $10,000 but that's well within reach for most people who are here, I expect. | | |
| ▲ | qlm 5 hours ago | parent | next [-] | | Hacker News moment | |
| ▲ | toxik 5 hours ago | parent | prev | next [-] | | $10k is well outside my budget for frivolous computer purchases. | | |
| ▲ | zozbot234 an hour ago | parent | next [-] | | It would be plenty in-budget if the software part of local AI was a bit more full-featured than it is at present. I want stuff like SSD offload for cold expert weights and/or for saved/cached KV-context, dynamic context sizing, NPU use for prefill, distributed inference over the network, etc. etc. to all be things that just work for most users, without them having to set anything up in an overly error-prone way. The system should not just explode when someone tries to run something slightly larger; it should undergo graceful degradation and let them figure out where the reasonable limits are. | |
| ▲ | stefs 2 hours ago | parent | prev | next [-] | | yeah, but if you really really wanted to and/or your livelyhood depended on it, you probably could afford it. | |
| ▲ | bdangubic 5 hours ago | parent | prev [-] | | 99.97% of HN users are nodding… :) | | |
| ▲ | hparadiz 3 hours ago | parent [-] | | There are way too many good uses of these models for local that I fully expect a standard workstation 10 years from now to start at 128GB of RAM and have at least a workstation inference device. | | |
| ▲ | bdangubic 3 hours ago | parent [-] | | or if you believe a lot of HN crowd we are in AI bubble and in 10 years inference will be dirt cheap when all of this crashes and we have all this hardware in data centers and it won't make any sense to run monster workstations at home (I work 128GB M4 but not run inference, just too many electron apps running at the same time...) :) | | |
| ▲ | hparadiz 2 hours ago | parent [-] | | Inference will be dirt cheap for things like coding but you'll want much more compute for architectural planning, personal assistants with persistent real time "thinking / memory", as well as real time multimedia. I could put 10 M4s to work right now and it won't be enough for what I've been cooking. |
|
|
|
| |
| ▲ | SlavikCA 5 hours ago | parent | prev | next [-] | | I'm running it on my Intel Xeon W5 with 256GB of DDR5 and Nvidia 72GB VRAM. Paid $7-8k for this system. Probably cost twice as much now. Using UD-IQ4_NL quants. Getting 13 t/s. Using it with thinking disabled. | |
| ▲ | kylehotchkiss 36 minutes ago | parent | prev | next [-] | | you have proved my point | |
| ▲ | rwmj 5 hours ago | parent | prev [-] | | For some reason you were being downvoted but I enjoy hearing how people are running open weights models at home (NOT in the cloud), and what kind of hardware they need, even if it's out of my price range. |
|
|
|