In other words: controlling for that kind of potential data-mixing is the same as in any other application where customer data is co-located within the same running process/memory/storage space.

▲

jcgrillo 3 hours ago | parent [-]

Yes, however the companies that are responsible for doing it have already shown their asses a little bit with all the jailbreaking stuff, and we know they produce really awful code from all the recent harness issues... To my mind that indicates this critical invariant deserves a little scrutiny. But with all the vibe slop being slung these days who knows what's safe anymore.

All that is to say I sure would appreciate a coherent, clear technical explanation of how they ensure user data are separate while serving concurrent queries.

	▲	wolttam 3 hours ago \| parent [-]
		They’re valid things to be concerned about IMO. I think you’re looking for an answer you’re not going to get unfortunately. I think there actually is a higher than average risk of data leakage with the insane optimizations that go into model serving - GLM5.1 had an issue of going into jibberish when their infra was under high load, and it turned out to be a cross-request KV cache contamination issue.[1] Personally, my effort has been to use local models only as of late, and it’s gone pretty well! [1]: https://z.ai/blog/scaling-pain