I get 98.6% cache hits on Claude code. Short of drastic arch changes it’s hard to imagine it getting much better.

gobdovan 17 hours ago | parent | next [-]

98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.

▲

kridsdale1 16 hours ago | parent [-]

We are all going to be graded by (tickets closed / tokens burned) soon enough.

	▲	recursive 16 hours ago \| parent \| next [-]
		Sweet. I can get that up to infinity, assuming they're using IEEE-754 division.
	▲	nchie 4 hours ago \| parent \| prev [-]
		I doubt it, the difference between someone slightly inefficient and someone extremely efficient isn't big enough to matter compared to how much they cost in salary.

▲

hedgehog 15 hours ago | parent | prev [-]

You pay for cache hits on every turn and even with the newest architectures longer context is slower/more energy intensive. Constructing concise turns that reuse prefix and stop when the new context is no longer useful help, as does pushing generation down into cheaper models while using stronger models for verification.