| ▲ | vladgur 3 hours ago | ||||||||||||||||||||||
This is getting very close to fit a single 3090 with 24gb VRAM :) | |||||||||||||||||||||||
| ▲ | originalvichy 3 hours ago | parent | next [-] | ||||||||||||||||||||||
Yup! Smaller quants will fit within 24GB but they might sacrifice context length. I’m excited to try out the MLX version to see if 32GB of memory from a Pro M-series Mac can get some acceptable tok/s with longer context. HuggingFace has uploaded some MLX versions already. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | GaggiX 3 hours ago | parent | prev [-] | ||||||||||||||||||||||
At 4-bit quantization it should already fit quite nicely. | |||||||||||||||||||||||
| |||||||||||||||||||||||