This sounds like one of those problems where the solution is not a UX tweak but an architecture change. Perhaps prompt cache should be made long term resumable by storing it to disk before discarding from memory?

▲

kivle 7 hours ago | parent | next [-]

I agree.. Maybe parts of the cache contents are business secrets.. But then store a server side encrypted version on the users disk so that it can be resumed without wasting 900k tokens?

▲

slashdave 5 hours ago | parent | prev [-]

Disk where? LLM requests are routed dynamically. You might not even land in the same data center.

	▲	FuckButtons 3 hours ago \| parent [-]
		But if you have a tiered cache, then waiting several seconds / minutes is still preferable to getting a cache miss. I suspect the larger problem is the amount of tinkering they are doing with the model makes that not viable.