| ▲ | jcgrillo a day ago | |||||||||||||||||||||||||
Yeah despite the conceptual statelessness, there is quite a bit of state that hangs around though--KV cache and context. I still haven't been able to find anything concrete in docs about how these are isolated. In any case it's clearly a different class of issue than the one from the article. Not endemic to how LLMs work, just normal web session stuff, modulo some GPU memory handling. | ||||||||||||||||||||||||||
| ▲ | ipython 7 hours ago | parent [-] | |||||||||||||||||||||||||
As far as I know the only data of the two you identified are cached inside of the inference layer - the KV cache. Then again, I am not an expert in designing and operating inference, so I could be incorrect on that. Either way, both of those are controlled by deterministic code and not the LLM itself. So controlling for that risk is much simpler to model IMO since the mitigation can be applied universally and deterministically rather than hoping and praying some non-deterministic system will respect your wishes. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||