Looking at API providers like Together that host open source models like Llama 70b and running these models in production myself, they have healthy margins (and their inference stack is much better optimized).