It's really just a performance tradeoff, and where your acceptable performance level is.
Ollama, for example, will let you run any available model on just about any hardware. But using the CPU alone is _much_ slower than running it on any reasonable GPU, and obviously CPU performance varies massively too.
You can even run models that are bigger than available RAM too, but performance will be terrible.
The ideal case is to have a fast GPU and run a model that fits entirely within the GPU's memory. In these cases you might measure the model's processing speed in tens of tokens per second.
As the idealness decreases, the processing speed decreases. On a CPU only with a model that fits in RAM, you'd be maxing out in the low single digit tokens per second, and on lower performance hardware, you start talking about seconds over token instead. If the model does not fit in RAM, then the measurement is minutes per token.
For most people, their minimum acceptable performance level is in the double digit tokens per second range, which is why people optimize for that with high-end GPUs with as much memory as possible, and choose models that fit inside the GPU's RAM. But in theory you can run large models on a potato, if you're prepared to wait until next week for an answer.