| ▲ | nylonstrung 8 hours ago | |
Unless I misunderstood it seems like this is trailing the pareto frontier in cost and speed. Compare to providers like Fireworks and even with the openrouter 5% charge it's not competitive | ||
| ▲ | linolevan 4 hours ago | parent | next [-] | |
According to the providers that I keep track of, Cumulus is typically pretty price competitive, except for MiniMax where DeepInfra and Together are much cheaper and GLM-5 where DeepInfra and z.AI's own hosting is much cheaper. (Also technically qwen3 8b w/ novita being first place but barely) | ||
| ▲ | 2uryaa 6 hours ago | parent | prev [-] | |
our SLA is actually higher and we are lower priced. We are also using this as a step into serving finetuned models for much cheaper than Fireworks/Together and not having the horrible cold starts of Modal. We're essentially trying to prove that our engine can hang with the best providers while multiplexing models. | ||