| ▲ | Joeri 8 hours ago | |||||||
This sounds like one of those problems where the solution is not a UX tweak but an architecture change. Perhaps prompt cache should be made long term resumable by storing it to disk before discarding from memory? | ||||||||
| ▲ | kivle 7 hours ago | parent | next [-] | |||||||
I agree.. Maybe parts of the cache contents are business secrets.. But then store a server side encrypted version on the users disk so that it can be resumed without wasting 900k tokens? | ||||||||
| ▲ | slashdave 5 hours ago | parent | prev [-] | |||||||
Disk where? LLM requests are routed dynamically. You might not even land in the same data center. | ||||||||
| ||||||||