| ▲ | minraws 5 hours ago | |
Given my experience with hosting these models at scale, working and optimizing load, I don't think the margins are nearly as high as 75% if the models are as big as people often claim. Only reason deepseek is so cheap is because well I don't know, but actual pricing should be around their initial price which was 4x, at that price you have a healthy 25-50% margin based on occupancy, given the deepseek v4 is a very sparse moe model. GLM 5.2 for example doesn't have more than 30-50% margins that's assuming old pricing for GPUs, current inflated GPU pricing well I am certain the margins must be lower. Ofc you can host for cheaper with quantization, and if you have very consistent capacity/utilization, which is not the norm with AI workloads. Overall for large models like GPT 5.5 or Opus there must be healthier margins of around 50-70% assuming GPU pricing didn't increase for these companies. Even if it did 30-40% margin should be possible, even in worst case assuming all GPU they had saw a jump in pricing. For smaller models it's hard to say, I would guess 20% but these models might be much smaller than I suspect, then it might be double that. Note the issue is less intelligent tokens don't linearly scale down in memory usage, which is the biggest pain point of serving models. Context sizes have fucked us all. Also anyone claiming OAI makes less margins on APIs or stuff might be wrong given they are on much lower context size, 1M context definitely is a lot more expensive to serve especially with smaller models like sonnet. | ||