| ▲ | thewebguyd 7 hours ago | ||||||||||||||||||||||
I'd go for at least 32GB+. It'll fit in 24GB but leaves you little to no room for context, and that's at 4-bit quantization. If you want to run unquantized, you definitely need 128GB. | |||||||||||||||||||||||
| ▲ | Catloafdev 6 hours ago | parent | next [-] | ||||||||||||||||||||||
Nobody runs unquantized, there's literally no reason to. Q8 would be the largest anyone actually runs on consumer hardware for inference. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | bitexploder 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
It also comes down to inference speed, not "can I run this". 8-bit quant is quite a bit slower on an M5 Pro. | |||||||||||||||||||||||
| ▲ | gchamonlive 6 hours ago | parent | prev [-] | ||||||||||||||||||||||
[dead] | |||||||||||||||||||||||