| ▲ | wxw 4 hours ago | |
> [...] newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array > My strongest hypothesis is that this is not random deterioration but a training artifact. [...] Anthropic’s own client appears to expect and accept a fair amount of slop and repairs it, mostly silently > If reinforcement learning happens in a harness like that, or a simulation of one, then slightly malformed tool calls can still complete the task and receive reward. > Worse, the model may become very strongly adapted to the canonical Claude Code edit tool shape. > Tool schemas are somewhere in the distribution and some shapes are close to what the model saw during post-training and some are far away. Great article. Interesting root cause hypothesis. Couldn't one simply strip the slop-handling from the RL env's harness to avoid this though? I do agree on the walled garden being built here. Proprietary frontier models performing best in proprietary harnesses makes sense for Anthropic's interests. | ||