| ▲ | doesnt_know 7 hours ago | |
How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain? You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used. | ||
| ▲ | dlivingston 34 minutes ago | parent | next [-] | |
What is being discussed is KV caching [0], which is used across every LLM model to reduce inference compute from O(n^2) to O(n). This is not specific to Claude nor Anthropic. | ||
| ▲ | computably 19 minutes ago | parent | prev | next [-] | |
> How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain? 1. Compute scaling with the length of the sequence is applicable to transformer models in general, i.e. every frontier LLM since ChatGPT's initial release. 2. As undocumented changes happen frequently, users should be even more incentivized to at least try to have a basic understanding of the product's cost structure. > You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used. I think "internal technical implementation" is a stretch. Users don't need to know what a "transformer" is to understand the trade-off. It's not trivial but it's not something incomprehensible to laypersons. | ||
| ▲ | tempest_ 2 hours ago | parent | prev [-] | |
I use CC, and I understand what caching means. I have no idea how that works with a LLM implementation nor do I actually know what they are caching in this context. | ||