Remix.run Logo
howinator 3 days ago

I could be wrong, but I think this pricing is the first to admit that cost scales quadratically with number of tokens. It’s the first time I’ve seen nonlinear pricing from an LLM provider which implicitly mirrors the inference scaling laws I think we're all aware of.

jpau 3 days ago | parent | next [-]

Google[1] also has a "long context" pricing structure. OpenAI may be considering offering similar since they do not offer their priority processing SLAs[2] for context >128K.

[1] https://cloud.google.com/vertex-ai/generative-ai/pricing

[2] https://openai.com/api-priority-processing/

energy123 3 days ago | parent | prev [-]

Is this marginal pricing or if you go from 200,000 to 200,001 tokens your total costs double?