| ▲ | LLMs Corrupt Your Documents When You Delegate(arxiv.org) | ||||||||||||||||||||||||||||
| 45 points by rbanffy 5 hours ago | 12 comments | |||||||||||||||||||||||||||||
| ▲ | causal an hour ago | parent | next [-] | ||||||||||||||||||||||||||||
Yeah I've been saying this for a while: AI-washing any text will degrade it, compounding with each pass. "Semantic ablation" is my favorite term for it: https://www.theregister.com/software/2026/02/16/semantic-abl... | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | jonmoore an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I really liked the evaluation method here - testing fidelity by round-tripping through chains of invertible steps. It was striking how even frontier models accumulated errors on seemingly computer-friendly tasks. It would be interesting to know if the stronger results on Python are not just an artefact of the Python-specific evaluation, if they carry over to other common general-purpose languages, and if they are driven by something specific in the training processes. | |||||||||||||||||||||||||||||
| ▲ | woeirua 41 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
It's an interesting paper, but I'd like to see a lot more about the types of errors that the LLM makes. Are they happening in the forward pass or the inverse pass? My guess is the inverse pass. | |||||||||||||||||||||||||||||
| ▲ | adampunk 25 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
LLMs will make mistakes on every turn. The mistakes will have little to no apparent connection to "difficulty" or what may or may not be prevalent in the training data. They will be mistakes at all levels of operation, from planning to code writing to reporting. Whether those mistakes matter and whether you catch them is mostly up to you. I have yet to find a model that does not make mistakes each turn. I suspect that this kind of error is fundamentally incorrigible. The most interesting thing about LLMs is that despite the above (and its non-determinism) they're still useful. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | cyanydeez an hour ago | parent | prev [-] | ||||||||||||||||||||||||||||
I played around with a local LLM to try and build a wiki like DAG. It made a lot of stupid errors from vague generic things like interpreting based on file names to not following redirects and placing the redirect response in them. I've also had them convert to markdown something like an excel formatted document. It worked pretty well as long as I was examining the output. But the longer it ran in context, the more likely it was to try in slip things in that seemed related but wasn't part of the break down. The only way I've found to mitigate some of it is to make every file a small-purpose built doc. This way you can definitely use git to revert changes but also limit the damage every time they touch them to the small context. Anyone who thinks they're a genius creating docs or updating them isnt actually reading the output. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||