Do long sessions also burn through token budgets much faster?

If the chat client is resending the whole conversation each turn, then once you're deep into a session every request already includes tens of thousands of tokens of prior context. So a message at 70k tokens into a conversation is much "heavier" than one at 2k (at least in terms of input tokens). Yes?

▲

dathery 11 hours ago | parent | next [-]

That's correct. Input caching helps, but even then at e.g. 800k tokens with all of them cached, the API price is $0.50 * 0.8 = $0.40 per request, which adds up really fast. A "request" can be e.g. a single tool call response, so you can easily end up making many $0.40 requests per minute.

▲

acjohnson55 10 hours ago | parent [-]

Interesting, so a prompt that causes a couple dozen tool calls will end up costing in the tens of dollars?

	▲	isbvhodnvemrwvn 5 hours ago \| parent [-]
		Not necessarily, take a look at ex OpenApi Responses resource, you can get multiple tool calls in one response and of course reply with multiple results.

▲

jasondclinton 11 hours ago | parent | prev [-]

If you use context cacheing, it saves quite a lot on the costs/budgets. You can cache 900k tokens if you want.