| ▲ | kpw94 14 hours ago | ||||||||||||||||||||||||||||||||||||||||
> I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system. That's a good idea! Curious about this, if you don't mind sharing: - what's the stack ? (Do you run like llama.cpp on that rented machine?) - what model(s) do you run there? - what's your rough monthly cost? (Does it come up much cheaper than if you called the equivalent paid APIs) | |||||||||||||||||||||||||||||||||||||||||
| ▲ | clusterhacks 12 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
I ran ollama first because it was easy, but now download source and build llama.cpp on the machine. I don't bother saving a file system between runs on the rented machine, I build llama.cpp every time I start up. I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap. I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||