Remix.run Logo
arjie a day ago

I have a little note from the past about the thinking trace[0] where DeepSeek R1 produces a trace like this:

    (Dimethyl(oxo)-lambda6-sulfa雰囲idine)methane donate a CH2rola group occurs in reaction, Practisingproduct transition vs adds this.to productmodule. Indeed"come tally said Frederick would have 10 +1 =11 carbons. So answer q Edina is11.
And then concludes the 'right'[1] answer for a Chemistry question. If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet. I talked to the author a while ago, but forgot to follow up since his paper was going to come out at NIPS or something, so if someone else finds it maybe they can share.

0: https://wiki.roshangeorge.dev/w/Blog/2025-10-12/Word_Magic#I...?

1: In the sense of true belief, I suppose

ekidd a day ago | parent | next [-]

> If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet.

Yes, several models think in weird jargon. Here is an example of Mythos's thinking while playing solitaire: https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illeg...

> 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR--—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig--—-BREAK:-9♥

This is a small step in the direction of something called "neuralese", where the model has stopped thinking in English and is thinking in internal vector spaces. Since this gets serialized through text, it isn't quite true neuralese, but it's moving in that direction.

I mean, I'm sympathetic towards the models. My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.

jaggederest a day ago | parent [-]

> My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.

This is something really interesting to me. It turns out there's far more diversity in thinking than you'd imagine given that we're all largely similar meat-in-a-box. I'm on the visio-spatial-tacit wing and speaking my thoughts outloud can be very awkward, whereas one of my former coworkers is on the "all thinking is in words and visual/spatial information comes in the form of words describing the scene" wing, so he can literally narrate his thought process out loud, very interesting conversations can be had discussing the subjective differences.

chadcmulligan a day ago | parent [-]

interesting, probably has something to do with why some people like pair programming. I'm in the visio-spatial-tacit and refuse pair programming because its so much work, but all thinking in words its probably not a stretch.

jaggederest a day ago | parent [-]

I'm with you, I actually love pair programming, but it might as well be 10x multiplier on energy depletion, so maybe an hour or two a week before I'm barbecue. It's only recently that I've started to realize that some other people don't find pair programming especially more difficult than solo.

drdaeman a day ago | parent | prev [-]

Isn't that just a token noise from a broken implementation or model quantization? I've had models spewing out nonsense like that, every time it was either that there was a bug in llama.cpp or some messed up .gguf.