The core of the issue is that you need beefy GPUs to really run these models at production workloads.
So I think what you're currently imagining won't happen until GPU prices go down massively