Higher-end gaming laptops are still decently priced and work well for local AI inference.

And Linux runs better than ever on them; I'm running debian 13 with almost no driver issues.

For $2k you can get 32 GB DDR5 RAM and 16 GB fast VRAM. Bump the RAM to 64 GB and you're still below $3k.

What models or classes of models would I be able to run on that hardware?

I've asked myself that question while looking at some of the models on this: site https://laptopparts4less.frl/index.php?route=common/home

▲

vibe42 16 hours ago | parent [-]

With 16 GB VRAM one can run a decent quant (Q4-Q8) of newer, smaller dense models. This leaves room for e.g. 32-256k context size.

This might not be enough to chew through a large code base but for smaller projects it can easily fit enough if not all of the code base to drive a good coding agent.

I don't recommend specific models or model providers due to how much hype and BS there is around benchmarks etc. Easiest is to check the latest generation of open models and look for a dense-type where a decent quant fits within the VRAM.

Some models run fast enough that some of the weights can spill over from VRAM to RAM while maintaining a usable prompt/token gen speed.

	▲	solstice 13 hours ago \| parent [-]
		Thank you!