| ▲ | jazzyjackson 5 hours ago | |||||||
They’ll sell you a bundle, either a pair or a quartet so you can have 256 or 512GB over a 400GB/s network link I can’t figure out when it makes sense to pay 10k up front for a quantized Llama 3.1 but it’s an interesting option | ||||||||
| ▲ | c7b 3 hours ago | parent | next [-] | |||||||
You could fit a Q4 GLM5.2 in 512GB and still have some space for context (372-475GB for the model): https://unsloth.ai/docs/models/glm-5.2 But yeah, there's a bit of a dearth of models that could fully utilize memory in the 128-256GB bracket at the moment. But things move so fast in this space, I wouldn't base my decision on a generation of models that's just a few months old. | ||||||||
| ||||||||
| ▲ | girvo 3 hours ago | parent | prev | next [-] | |||||||
Not Llama 3.1, but Step 3.7 Flash is one of the few new high quality models in this size bracket. DeepSeek v4 Flash too | ||||||||
| ▲ | SkitterKherpi 4 hours ago | parent | prev [-] | |||||||
10k is rather a lot yes. For LLMs you can use a lot of tokens with 10k with less hassle without the machine (and also it's not like electricity is free), but for some other things like video models 10k would get burned very fast. I am looking for something more in the 5k range though. | ||||||||