Darn I've only got ~20 GB of VRAM. I really need to get a stronger machine for this sort of stuff.

MerrimanInd 8 hours ago | parent | next [-]

20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card?

I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others.

	▲	zamadatix 8 hours ago \| parent [-]
		> 20GB isn't enough for a 13B parameter model? I thought the 29-31B models could run on a 24GB GTX x090 card? Parameters are like Hertz - they don't really tell you much until you know the rest anyways. In this case, a parameter is a bfloat16 (2 bytes). I'm sure someone will bother to makes quants at some point. > I'm currently shopping for a local LLM setup and between something like the Framework Desktop with 64-128GB of shared RAM or just adding a 3090 or 4090 to my homelab so I'm very curious what hardware is working well for others. I grabbed a 395 laptop w/ 128 GB to be a personal travel workstation. Great for that purpose. Not exactly a speed demon with LLMs but it can load large ones (which run even slower as a result) and that wasn't really my intent. I've found GPUs make more usable local LLMs, particularly in the speed department, but I suppose that depends more on how you really use them and how much you're willing to pay to have enough total VRAM. It's next to impossible to make your money back on local (regardless what you buy) so I'd just say "go for whatever amount of best you're willing to put money down for" and enjoy it.

▲

Wowfunhappy 8 hours ago | parent | prev [-]

How much system memory do you have? Llama.cpp can split layers across cpu and gpu. Speeds will be slower of course but it's not unusable at all.