Remix.run Logo
segmondy 4 days ago

Yes, and plenty of others do too. Quantizied. Join us at r/localllama

My largest models

   318G    /llmzoo/models/Qwen3.5-397B
   377G    DeepSeekv3.2-nolight
   380G    /llmzoo/models/DeepSeek-V3.2-UD
   400G    /llmzoo/models/Qwen3.5-397B-Q8
   443G    DeepSeek-Math-v2
   443G    DeepSeek-V3-0324-Q5
   522G    /llmzoo/models/GLM5.1
   545G    /llmzoo/models/kimi2.6
   546G    /llmzoo/models/KimiK2.5
danilocesar 3 days ago | parent | next [-]

Is your house's heating system based on H100s?

2 days ago | parent [-]
[deleted]
Liftyee 4 days ago | parent | prev | next [-]

What hardware do you use?

Terretta 20 hours ago | parent | next [-]

Most of those have custom quants for Mac Studio M3 Ultra 512GB. You'll typically see them mention it by name.

All of that list but the last three run at these sizes. For last three, look for a custom quant, e.g. 9.5 bits and/or the Ultra M3 512GB mention.

Not sure which direction I'm surprised but Macbook Pro M5 Max ticks over models at the same speed. With "only" 128GB look for models of 116 GB (the absolute max that retains reasonable stability) or less.

MezzoDelCammin 3 days ago | parent | prev | next [-]

I think the answer to this is:"yes"

CoolThings 3 days ago | parent | prev | next [-]

a Beowulf cluster of 256 x Raspberry Pi 3.

hhh 2 days ago | parent [-]

I used to maintain a 2000 pi 4 cluster, before LLMs were relevant, with around 6gb free ram per node. I wonder what I could have done with something like this.

tclancy 3 days ago | parent | prev [-]

All of it.

chid 3 days ago | parent | prev [-]

even quantised, those are HUGE