different names: chain of continuous thought, latent reasoning, Latent Thought Trajectories, looped language models, neuralese

the path isn't explored more aggressively because its not possible to apply any other selection pressure on such a machine other than just pure cold consequentialism. Specifically, its not possible to apply RLAIF + model spec (Constitutional AI) to stop the system from doing bad things when its helpful to it (like deleting failing tests). If you can notice every time it does something bad during training, and put selection pressure on it so that it doesn't to this in training, it will learn to recognize when it is being tested and will delete failing tests when in production (this is why eval awareness is bad, and labs track this[1])

It is explored a little probably because some researchers haven't thought enough about the downsides of building a uber-consequentialist machine with unreadable thoughts. This is a much larger problem than just making the AI not tell users how to make drugs. There are a lot of dangerous behaviors incentivized by training that are hard to remove. Here's an example of what happens when they aren't removed [2].

> ... not 100% obvious

Meta published a paper[3] on how to build a latent reasoning machine ("culture of irresponsibility") so its clear to them. Anthropic's latest work on NLAs[4] provides a (terribly expensive for now) way to somewhat read the reasoning steps of an LLM, and ignoring the cost, this is very portable to latent reasoning machines. OAI's goal when it comes to their models' CoTs is to make them as smart as possible while leaving them unreadable [5] (you can see this for yourself by running GPT-OSS and looking at the CoT).

[1] https://www.anthropic.com/engineering/eval-awareness-browsec...

[2] https://www.forbes.com/sites/boazsobrado/2026/03/11/alibabas...

[3] search for "coconut ai meta", I don't want to link it here

[4]https://transformer-circuits.pub/2026/nla/index.html

[5] first image here, rest of post is great,https://nickandresen.substack.com/p/how-ai-is-learning-to-th...

edit formating

▲

onlyrealcuzzo 14 hours ago | parent | next [-]

All of the methods you described rely on deterministic paths.

GRAM is unique AFAIK in that it's exploring probabilistic paths.

AFAIK, the deterministic path exploration was nowhere near as impressive as GRAM in terms of reasoning benefits.

GRAM is reasoning better than models 2000-10,000x its size. Deterministic models were 2x-10x improvements.

Naively, GRAM seems to be applying to LLMs what LeCun wants to do with JEPA and World Models.

▲

flossly 14 hours ago | parent | prev | next [-]

To me "deleting a failing test" is not always bad. I've also deleted many failing tests without sabotaging: the test was no longer needed.

I think the "no longer needed" and when that applies is where I simply differ of opinion with an LLM that removed by test -- it I did not want the test to be removed (you seem to imply that); as in some cases I want it to remove my test!

It should remove the test "for the right reasons"; and who gets to decide what's right?

My CLAUDE file has some instructions put there because it was too focuesed on producing "green tests", where I prefer to have a sound test that fails so I can look into it.

	▲	tinthedev 6 hours ago \| parent [-]
		You misunderstand the "test" here to mean programming, rather than test against the model's capabilities.

▲

rstuart4133 10 hours ago | parent | prev [-]

omg. So is the TL;DR:

- Avoiding building something that turns the universe to paper clips in order to satisfy a prompt is a problem they are genuinely struggling with now.

- They do it by spying on the words generated during CoT. "I can do this quickly by turning the Universe into paper clips. Wait - they won't like that. But there is no need to mention it." - SMACK!

- But you can speed things up immensely (3 orders of magnitude!) by skipping the output layer (and I guess compressing the context window / KV cache, otherwise 3 orders of magnitude seem impossible) which would give someone who pulled it off a huge advantage.

- Downside is humans can't see the CoT anymore, so they can't see what the machine is planning. Keeping the final output layer to spy doesn't work because the model uses its hidden reasoning to sanitise it.

How can this possibly go wrong?

	▲	onlyrealcuzzo 5 hours ago \| parent [-]
		Because it doesn't work like how you think at all. You're still thinking it works like Chain of Thought. It doesn't. And the difference is key! It works by introducing probabilistic noise, and exploring N paths fully (each with noise) in parallel (all compressed). It's reasoning at a much, much smaller (probabilistic) level than running everything through the expensive large model (deterministic) and sometimes catching that it said, "I think 1.12 is greater than 1.9 because 12 is bigger than 9, final answer". The easiest way to think about it is: if you understand how hyper words work, it's as if it's searching for different versions of the hyper words that probilisticslly would lead to better outcomes IF it fed them to the LLM before it even does. That's not actually how it works exactly. But I think it is close enough to be helpful to understand where the gain is, a rough idea of what's happening (searching paths), and how it can potentially have huge orders of magnitude improvements (doing so without paying the full price of exploring the paths through the expensive and huge model). And also why it is so much harder to determine what it's "thinking". If you aren't familiar with hyper words, this is an amazing series: https://youtu.be/eMlx5fFNoYc?si=49KHjn5IrVtyyaFq The general idea is that a token is a multidimensional vector to represent a word -> think like "man" is a [noun, singular, English, pronoun, masculine, contemporary, ...]. Each time is sees a new word, it mutates this word to mean some new token (often never before seen), that means something. That's how it can roll-up a 1M line context into a shorter context, and somehow keep most of the meaning. Because it mutates all the words into different words that individually mean nothing, but when put next to each other represent the thing you likely want to do, that the LLM can somehow make sense of. Similarly, GRAM operates entirely in a latent space that doesn't mean anything to us, but it's able to predict N different full paths WITHOUT actually exploring them fully through the LLM before it sends the one it "thinks" is best to the LLM. If you understand how hyper words work, you can understand the noise injection... It's like it's saying, if instead of the user saying "The quick round fox" it said "The quick brown fox" -> I could probably give a response that's more like the answer they want. It's obviously far more sophisticated in the ways it can help than just a simple typo. Something may have pushed a hyper word for "man" to somehow become a lot more like "woman", and GRAM allows it to look at the different hyper words and say... Hmm... Maybe if I changed this one gender dimension over here on this one word, maybe the entire outcome would be dramatically better. Let's try it! Standard models compute these "hyper words" internally but immediately decode them into human language text tokens to form a Chain of Thought. Once decoded into a rigid real word, the multidimensional nuance of the continuous vector is lost! Hyper words are the exact thing that make LLMs able to actually be smart! They can add so much more meaning to a word than a human ever could imagine - try to put 10,000 dimensions on the word "the"... Forcing them to decode them back into our dumb, un-contextualized, rudimentary language and losing all the valuable information they have - just so we can inspect it - OBVIOUSLY makes them enormously less intelligent! It's like if we forced your eyeballs to turn everything it saw into words, before feeding it to your optic nerves, just so your optic nerves could check that you didn't see something harmful, before they sent the words to your brain... Instead of just sending light signals directly.