Remix.run Logo
aragonite 11 hours ago

Do long sessions also burn through token budgets much faster?

If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?

dathery 11 hours ago | parent | next [-]

That's correct. Input caching helps, but even then at e.g. 800k tokens with all of them cached, the API price is $0.50 * 0.8 = $0.40 per request, which adds up really fast. A "request" can be e.g. a single tool call response, so you can easily end up making many $0.40 requests per minute.

acjohnson55 10 hours ago | parent [-]

Interesting, so a prompt that causes a couple dozen tool calls will end up costing in the tens of dollars?

isbvhodnvemrwvn 5 hours ago | parent [-]

Not necessarily, take a look at ex OpenApi Responses resource, you can get multiple tool calls in one response and of course reply with multiple results.

jasondclinton 11 hours ago | parent | prev [-]

If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.