Remix.run Logo
mmaunder 2 hours ago

Great theory. I'll dig deeper.

mmaunder 2 hours ago | parent [-]

Claude Code has a server-side anti-distillation opt-in called fake_tools, but the local code does not show the actual mechanism.

The client sometimes sends anti_distillation: ['fake_tools'] in the request body at services/api/claude.ts:301

The client still sends its normal real tools: allTools at services/api/claude.ts:1711

If the model emits a tool name the client does not actually have, the client turns that into No such tool available errors at services/tools/StreamingToolExecutor.ts:77 and services/tools/toolExecution.ts:369

If Anthropic were literally appending extra normal tool definitions to the live tool set, and Claude used them, that would be user-visible breakage.

That leaves a few more plausible possibilities:

Fake_tools is just the name of the server-side experiment, but the implementation is subtler than “append fake tools to the real tool list.”

or

The server may inject tool-looking text into hidden prompt context, with separate hidden instructions not to call it.

or

The server may use decoys only in an internal representation that is useful for poisoning traces/training data but not exposed as real executable tools.

cedws 2 hours ago | parent [-]

We do know that Anthropic has the ability to detect when their models are being distilled, so there could be some backend mechanism that needs to be tripped to observe certain behaviour. Not possible to confirm though.

mmaunder an hour ago | parent [-]

Who's we, and how do you know this?