Remix.run Logo
tensorlibb 8 days ago

I'm a huge fan of OpenRouter and their interface for solid LLM's but I recently jumped into fine tuning / modifying my own vision models for FPV drone detection (just for fun) and my daily workstation and it's 2080 just wasn't good enough.

Even in 2025 it's cool how solid a setup dual 3090's still are. nvlink is an absolute must but it's incredibly powerful. I'm able to run the latest Mistral thinking models and relatively powerful yolo based VLM's like the ones RoboFlow is based on.

Curious if anyone else is still using 3090's or has feedback for scaling up to 4-6 3090s.

Thanks everyone ;)

vladgur 4 days ago | parent | next [-]

I am exploring options just for fun.

a used 3090 is around $900 on ebay. a used rtx 6000 ADA is around $5k

4 3090s are slower at inference and worse at training than 1 rtx 6000.

4x3090 would consume 1400W at load.

Rtx 6000 would consume 300W at load.

If you god forbid live in California and your power averages 45 cents per kwh, 4x3090 would be $1500+ more per year to operate than a single RTX 6000[0]

[0] Back of the napkin/ChatGPT calculation of running the GPU at load for 8 hours per day.

Note: I own a pc with a 3090, but if i had to build an AI training workstation, i would seriously consider cost to operate and resale value(per component).

ismailmaj 4 days ago | parent | next [-]

To make matters worse, the RTX3090 was released during the crypto craze and so a decent amount of the second hand market could contain overused GPUs that won’t last long, even if 3xxx to 4xxx performance difference is not that high, I would avoid the 3xxx series totally for resell value.

aunty_helen 4 days ago | parent | next [-]

I bought 2 ex mining 3090s ~3 years ago. They’re in an always on pc that I remote into. Haven’t had a problem. If there was mass failures of gpus due to mining I would expect to have heard more about it

segmondy 4 days ago | parent | prev [-]

I have rig of 7 3090s that I bought from crypto bros, they are lasting quite alright and have been chugging along fine for the last 2 years. GPUs are electronic devices not mechanical devices, they rarely blow up.

akulbe 4 days ago | parent | next [-]

How do you have a rig that fits that many cards?? those things take 3 slots apiece.

Pictures, or it never happened! :D

dehugger 3 days ago | parent [-]

you get a motherboard designed for the purpose (many pcie slots) and a case (usually open frame) that holds that many cards. riser cables are used so every card doesnt plug directly into the motherboard

jonbiggums22 4 days ago | parent | prev [-]

I've noticed on ebay there are a lot of 3090s for sale that seem to have rusted or corroded heatsinks. I actually can't recall seeing this with used GPUs before but maybe I just haven't paying attention. Does this have to do with running them flat out in a basement or something?

dwood_dev 4 days ago | parent [-]

Run near a saltwater source without AC and that will happen.

supermatt 4 days ago | parent | prev | next [-]

I guess it depends on what you want to do: You get half the RAM in the 6000 (48 @ $104/GB) vs 4x3090 (96 @ $37.5/GB).

cfn 4 days ago | parent | prev | next [-]

I have an A6000 and the main advantage over a 3090 cluster is the build simplicity and relative silence of the machine (it is also used as my main dev workstation).

logicallee 4 days ago | parent | prev | next [-]

>I am exploring options just for fun.

Since you're exploring options just for fun, out of curiosity, would you rent it out whenever you're not using it yourself, so it's not just sitting idle? (Could be noisy and loud). You'd be able to use your computer for other work at the same time and stop whenever you wanted to use it yourself.

vladgur 4 days ago | parent [-]

It depends. At my electricity cost, 1 hour of 3090 or 1 hour of Rtx 6000 would cost the same 0.45

Just checked vast.ai. I will be losing money with 3090 at my electricity cost and making a tiny bit with rtx 6000.

Like with boats it’s probably better to rent GPUs then buy them

logicallee 4 days ago | parent | next [-]

(you should also be compensated for the noise and inconvenience from it, not only electricity.) It sounds like you might rent it out if the rental price were higher.

justinclift 4 days ago | parent | prev [-]

Would a solar panel setup be an option for fixing that? :)

segmondy 4 days ago | parent | prev [-]

... and this is why napkin calculation is terrible. Even running a GPU at load doesn't mean you are going to use the full wattage. 4 3090 running inference on large model barely uses 350watts combined.

vladgur 3 days ago | parent [-]

Can you clarify? Even if you down clock the card to 300W, why would running it at load not consume 4x300W?

segmondy a day ago | parent [-]

Inference is often like 200-250w without card clocked down. Then the other cards are like 20w-50w. 4 cards, 1 card is active at once. To get the full 350watt, you need to run parallel inference on the card with multiple users. So if I was using it as a server card and have 10 active users/processes then I might max out the active card. For example, I have a rig with 10 MI50 cards, I believe they are 250w each. Yet I rarely see pass 200w on the active card, they idle at about 20w, so that's 180w + 200w = around 380-400w on full load.

Think of the max watt like a car's max horsepower, a car might make 350HP, it doesn't mean it stays making 350HP all day long, there's a curve to it. At the low end it might be making 170HP and you will need to floor the gas pedal to get to that 350hp. Same with these GPUs. Most people will calculate the gas mileage by finding how much gas a car consumers at it's peak and say, oh, 6mpg when it's making 350hp so with your 20gallon thank, you have a range of 120miles. Which obviously isn't true.

jacquesm 4 days ago | parent | prev | next [-]

I've built a rig with 14 of them. NVLink is not 'an absolute must', it can be useful depending on the model and the application software you use and whether you're training or inferring.

The most important figure is the power consumed per token generated. You can optimize for that and get to a reasonably efficient system, or you can maximize token generation speed and end up with two times the power consumption for very little gain. You also will likely need to have a way to get rid of excess heat and all those fans get loud. I stuck the system in my garage, that made the noise much more manageable.

breakds 4 days ago | parent [-]

I am curious about the setup of 14 GPUs - what kind of platform (motherboard) do you use to support so many PCIe lanes? And do you even have a chassis? Is it rack-mounted? Thanks!

jacquesm 4 days ago | parent [-]

I used a large supermicro server chassis, a dual Xeon motherboard with 7 8 lane PCI Express slots, all the ram it would take (bought second hand), splitters, four massive powersupplies. I extended the server chassis with aluminum angle riveted onto the base. It could be rack mounted but I'd hate to be the person lifting it in. The 3090s were a mix, 10 of the same type (small, and with blower style fans on them) and 4 much larger ones that were kind of hard to accommodate (much wider and longer). I've linked to the splitter board manufacturer in another comment in this thread. That's the 'hard to get' component but once you have those and good cables to go with them the remaining setup problems are mostly power and heat management.

breakds 3 days ago | parent [-]

Thanks that is very inspiring. I thought there are no blower type consumer GPUs, but apparently they exist!

jacquesm 3 days ago | parent [-]

I got them second hand off some bitcoin mining guy.

https://www.tomshardware.com/news/asus-blower-rtx3090

Is the model that I have.

AJRF 4 days ago | parent | prev | next [-]

You really don't need NVLink, you won't saturate the PCIe lanes on a modern motherboard with dual 3090s.

Tim Dettmers amazing GPU blog post posits NVLink doesn't start to become useful until you are at 128+ GPUs

https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

fxtentacle 4 days ago | parent | prev | next [-]

The 3090 are a sweet spot for training. It’s the first generation with seriously fast VRAM. And it’s the last generation before Nvidia blocked NVlink. If you need to copy parameters between GPUs during training, the 3090 can be up to 70% faster than 4090 or 5090. Because the latter two are limited by PCI express bandwidth.

jacquesm 4 days ago | parent [-]

To be fair though, the 4090 and 5090 are much easier capable of saturating PCI express than the 3090 is, even at 4 lanes per card the 3090 rarely manages to saturate the links, it still handsomely pays off to split down to 4 lanes and add more cards.

I used:

https://c-payne.com/

Very high quality and manageable prices.

ericdotlee 4 days ago | parent [-]

I've purchase 16 of these - cpayne is great! Hope he finds a US distributor to help with tariffs a bit!

jacquesm 4 days ago | parent [-]

What blew me away is the quality and price point of what obviously can't be a very high volume product. This guy makes amazing stuff.

XCSme 4 days ago | parent | prev | next [-]

I bought a 2nd 3090 2 years ago for like 800eur, still a good price even today I think.

It's in my main workstation, and my idea was to always have Ollama running locally. The problem is that once I have a (large-ish) model running, all my VRAM is almost full and GPU struggles to do things like playing back a YouTube video.

Lately I haven't used local AI much, also because I stopped using any coding AIs (as they wasted more time than they saved), I stopped doing local image generations (the AI image generation hype is going down), and for quick questions I just ask ChatGPT, mostly because I also often use web search and other tools, which are quicker on their platform.

lifeinthevoid 4 days ago | parent [-]

I run my desktop environment on the iGPU and the AI stuff on the dGPUs.

XCSme 3 days ago | parent [-]

That's a real good point!

Unfortuatenly, my CPU (5900x) doesn't have an iGPU.

The last 5 years iGPU got a bit out of trend. Now maybe they actually make a lot of sense, as there is a clear use-case which involves having dedicated GPU always in-use which is not gaming (and gaming is different, cause you don't often multi-task while gaming).

I do expect to see a surge in iGPU popularity, or maybe a software improvement to allow having a model always available without constantly hogging the VRAM.

XCSme 3 days ago | parent [-]

PS: I thought Ollama had a way to use RAM instead of VRAM (?) to keep the model active when not in use, but in my experience that didn't solve the problem.

CraigJPerry 4 days ago | parent | prev | next [-]

if it's just for detection would audio not be cheaper to process?

I'm imagining a cluster of directional microphones, and then i don't know if it's better to perform some sort of band pass filtering first since it's so computationally cheap or whether it's better to just feed everything into the model directly. No idea.

I guess my first thought was just sounds from a drone likely is detectable reliably at a greater distance than visual, they're so small and a 180 degree by 180 degree hemisphere of pixels is a lot to process.

Fun problem either wayway.

4 days ago | parent | prev [-]
[deleted]