Remix.run Logo
vonneumannstan 7 days ago

>I feel it is interesting but not what would be ideal. I really think if the models could be less linear and process over time in latent space you'd get something much more akin to thought.

Please stop, this is how you get AI takeovers.

adastra22 7 days ago | parent | next [-]

Citation seriously needed.

vonneumannstan 6 days ago | parent [-]

It's really very simple. As models become more capable they may become interested in deceiving humans or otherwise manipulating them to achieve their goals. We already see this in various places see:

https://www.anthropic.com/research/agentic-misalignment

https://arxiv.org/abs/2412.14093

If the chain of thought of models becomes pure "neuralese" i.e. the models think purely in latent space then we will lose the ability to monitor for malicious behavior. This is incredibly dangerous, CoT monitoring is one of the best and highest leverage tools for monitoring model behavior and losing that would be devastating for safety.

https://www.lesswrong.com/posts/D2Aa25eaEhdBNeEEy/worries-ab...

https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-f...

https://www.lesswrong.com/posts/3W8HZe8mcyoo4qGkB/an-idea-fo...

https://x.com/RyanPGreenblatt/status/1908298069340545296

https://redwoodresearch.substack.com/p/notes-on-countermeasu...

varelse 7 days ago | parent | prev [-]

[dead]