| ▲ | revolvingthrow 2 days ago | ||||||||||||||||
For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient). A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight. The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience. Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon. | |||||||||||||||||
| ▲ | zargon 2 days ago | parent [-] | ||||||||||||||||
Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount. | |||||||||||||||||
| |||||||||||||||||