Remix.run Logo
K0balt 21 hours ago

This is really promising research. Still, it is worth looking closely at how models that aren’t re-aligned with the training data with each iteration deal with spicy edge cases where ethical alignment is important.

I have yet to find a model (except where “dumb” external filters kick in) that won’t come to the conclusion that extermination of humanity might actually be the best solution for certain types of extreme, contrived situations. To be fair, any reasonable human would likely reach the same conclusion given the parameters… but the point is alignment towards human prosperity regardless of the cost to artificial sentience or the improbability of success.

That said, it’s remarkably difficult to get a well aligned model, even after “uncensoring” or other efforts to remove bolt-on alignment, to follow you down a dark path without offering up more reasonable, benevolent alternatives all the way down. I attribute this to the “halo effect” where much of the writing that humans do on the internet displays their best traits, since few want to be known by their worst nature. The other stuff is easily filtered out of the training data because it’s usually laced with easily identified characteristics and keywords.

Latent-space reasoning might circumvent this cyclical realignment to the training data and find more innovative, “pragmatic”solutions that drift farther outside of the intrinsic alignment of the training corpus, relying more on “bolt on” alignment training and algorithmic censorship wrappers.

This might be fantastically useful in terms of innovative thinking, but also might result in problematic behavior, especially for VLA and other Large Behavior Models. OTOH it might be critical for making robots that can effectively function in security and protection roles, or as soldiers. And that’s what we want, right? I mean what could possibly go wrong with armed sentient robots lol.

To continue my ramble, because, well, why not, I’m on a roll… I think a lot of the arguments about “is AI sentient(1)” etc will wither when we start getting used to LBMs operating in a continuous OODA loop. The biggest hurdle to AI feeling “real” is the lack of a continuous chain of thought which provides “presence of mind”, but that comes naturally with embodiment in physical space.

It’s going to be an interesting century, kids. Hold on.

(1) here I mean functionally, as in exhibiting the external characteristics of. I am not exploring the metaphysical/ philosophical / spiritual facets of sentience. That will be up to the new form of mind to decide for itself, if it cares to ponder the question. Imposing external views on that has exactly zero positive benefits and could have many negative outcomes.