Remix.run Logo
nylonstrung 8 hours ago

Unless I misunderstood it seems like this is trailing the pareto frontier in cost and speed.

Compare to providers like Fireworks and even with the openrouter 5% charge it's not competitive

linolevan 4 hours ago | parent | next [-]

According to the providers that I keep track of, Cumulus is typically pretty price competitive, except for MiniMax where DeepInfra and Together are much cheaper and GLM-5 where DeepInfra and z.AI's own hosting is much cheaper.

(Also technically qwen3 8b w/ novita being first place but barely)

2uryaa 6 hours ago | parent | prev [-]

our SLA is actually higher and we are lower priced. We are also using this as a step into serving finetuned models for much cheaper than Fireworks/Together and not having the horrible cold starts of Modal. We're essentially trying to prove that our engine can hang with the best providers while multiplexing models.