| ▲ | cyanydeez a day ago | |
Readimg through this thread, it seems likely is a KV cache "bug". Theyre likely doing too many evictions of the LLM cache so the context is being reloaded to often. Its a "bug" because its probably an intended effect of capturing the costs of compute but surfacing a fact that they oversold compute to a situations where they cant keep the KV cache hot and now its thrashing. | ||
| ▲ | bensyverson 21 hours ago | parent [-] | |
Caching helps them too, so I hope they fix it | ||