Remix.run Logo
2001zhaozhao 3 hours ago

Caching might be free, but I think making caching cost nothing at the API level is not a great idea either considering that LLM attention is currently more expensive with more tokens in context.

Making caching free would price "100000 token cache, 1000 read, 1000 write" the same as "0 token cache, 1000 read, 1000 write", whereas the first one might cost more compute to run. I might be wrong at the scale of the effect here though.