Thank you!

I presume 24B is somewhat faster since it's only 4B activated - 31B is quite a large dense model so more accurate!

This is one of the more confusing aspects of experimenting with local models as a noob. Given my GPU, which model should I use, which quantization of that model should I pick (unsloth tends to offer over a dozen!) and what context size should I use? Overestimate any of these, and the model just won't load and you have to trial-and-error your way to finding a good combination. The red/yellow/green indicators on huggingface.co are kind of nice, but you only know for sure when you try to load the model and allocate context.

▲

danielhanchen 2 days ago | parent [-]

Definitely Unsloth Studio can help - we recommend specific quants (like Gemma-4) and also auto calculate the context length etc!

▲

ryandrake 2 days ago | parent [-]

Will have to try it out. I always thought that was more for fine-tuning and less for inference.

	▲	danielhanchen 2 days ago \| parent [-]
		Oh yes sadly we partially mis-communicated haha - there's both and synthetic data generation + exporting!