Call me back when you can run these models on 16GB of RAM and any recent i5/i7. Until then, there’s no point on using these toy models.

▲

guax 5 hours ago | parent | next [-]

Its so funny, these "toy models" would be the wet dreams of researchers not 5 years ago.

Progress marches without mercy.

▲

kgeist 3 hours ago | parent [-]

Yeah people don't realize these "toy models" now completely destroy gpt-4o on most tasks, and no one called gpt-4o a toy model back in the day... It was OpenAI's flagship model from 2024 to 2025.

	▲	Gigachad an hour ago \| parent [-]
		Tbh in 2024 most were calling these models useless for programming and a scam. It wasn't until this year things really changed. My experience with Qwen 3.6 is it can do things, and it's super impressive it can do things, but it's not any more productive than doing it myself.

▲

giancarlostoro 7 hours ago | parent | prev | next [-]

You need it to run in about 8 GB so you have extra space for the context window.

▲

jboss10 4 hours ago | parent | prev | next [-]

They can be ran on 32GB with 8GB VRAM. I don't think these will be on 16GB for a while. (35B MoE)

▲

TheCycoONE 4 hours ago | parent [-]

I have 32GB of RAM with 16GB VRAM and I haven't had a lot of luck running larger models like this. Are you able to expand on that?

▲

slim 3 hours ago | parent [-]

use llama.cpp with cuda

	▲	TheCycoONE 2 hours ago \| parent [-]
		The problem may be that it's a 7800XT which handles memory contention by freezing.

▲

Catloafdev 7 hours ago | parent | prev [-]

Hello, it's the internet calling, today is that day.

https://github.com/ikawrakow/ik_llama.cpp

Edit: it's gonna be slow if you're not using any VRAM. But it's possible. Software isn't going to speed that up anytime soon, it's just a hardware bandwidth limit.