I bought a 2nd 3090 2 years ago for like 800eur, still a good price even today I think.

It's in my main workstation, and my idea was to always have Ollama running locally. The problem is that once I have a (large-ish) model running, all my VRAM is almost full and GPU struggles to do things like playing back a YouTube video.

Lately I haven't used local AI much, also because I stopped using any coding AIs (as they wasted more time than they saved), I stopped doing local image generations (the AI image generation hype is going down), and for quick questions I just ask ChatGPT, mostly because I also often use web search and other tools, which are quicker on their platform.

▲

lifeinthevoid 4 days ago | parent [-]

I run my desktop environment on the iGPU and the AI stuff on the dGPUs.

▲

XCSme 3 days ago | parent [-]

That's a real good point!

Unfortuatenly, my CPU (5900x) doesn't have an iGPU.

The last 5 years iGPU got a bit out of trend. Now maybe they actually make a lot of sense, as there is a clear use-case which involves having dedicated GPU always in-use which is not gaming (and gaming is different, cause you don't often multi-task while gaming).

I do expect to see a surge in iGPU popularity, or maybe a software improvement to allow having a model always available without constantly hogging the VRAM.

	▲	XCSme 3 days ago \| parent [-]
		PS: I thought Ollama had a way to use RAM instead of VRAM (?) to keep the model active when not in use, but in my experience that didn't solve the problem.