On a related topic, I've been following with some amusement the outrage on Reddit/r/Grok because Grok will no longer make porn. Apparently Grok was trained on all the NSFW material on X and Twitter before it intentionally so Grok could have a "spicy" mode. And spicy it was. Some of the stuff it made was really good and people loved it. But (allegedly) Musk changed his mind to go after enterprise and government accounts so spicy mode was killed and now there a lot of angry users complaining on Reddit.

My interest is this: It appears that it's not possible to over-ride the training effectively since NSFW material bleeds into normal image requests. Musk had this problem before trying to over-ride Grok's training, so at one point said he would have to retrain Grok. It's interesting to me that LLMs can't be steered effectively, which makes me wonder if they can ever really be aligned ("safe")

▲

duxup 2 days ago | parent | next [-]

I think the more general issue with all AI and "safe" is that AI 'learned' what it knows from human content ... and we object to the content we as humans created.

Hard to avoid that problem.

	▲	labrador 2 days ago \| parent [-]
		> Hard to avoid that problem. Agree. Even the Christian Bible has horrific content that in some communities would require trigger warnings

▲

buellerbueller 2 days ago | parent | prev | next [-]

Why do so many supposedly smart humans think that we can make an artificial mind that is capable of AGI (or even something close to it), but from a completely detached evolutionary history and biological needs, and somehow force it to "align" to our human/biological/societal priorities?

Have none of these people ever had or been a teenager? At least teens have some overlapping biological requirements with non-teens that will force some amount of alignment.

▲

labrador 2 days ago | parent [-]

I think the assumption is that without emotions there is no agression or evil intent

	▲	buellerbueller a day ago \| parent [-]
		When I think of the closest analogue I can, emotionless humans, the picture is generally sociopaths, and the "non-aligned." I think the assumption you mention is, frankly, sociopathic.

▲

elpakal 2 days ago | parent | prev [-]

I mean isn't this just considered data poisoning?

▲

labrador 2 days ago | parent [-]

The training data was considered good by Musk to start with, so he could have spicy mode, but he changed his mind and now Grok is considered poisoned with porn. My question is, can that be fixed or does he have to start over again?

▲

looobay 2 days ago | parent [-]

There was research on LLMs training and distillation that if two models have a similar architecture (probably the case for Xai) the "master" model will distill knowledge to the model even if its not in the distillation data. So they probably need to train a new model from scratch.

(sorry i don't remember the name but there was an example with a model liking howl to showcase this)

▲

-_- 2 days ago | parent [-]

Subliminal learning: https://alignment.anthropic.com/2025/subliminal-learning/

	▲	labrador 2 days ago \| parent [-]
		If true, bad news for Elon Musk and xAI because they have to start over. He's already indicated this in regards to Wikipedia. He wants to train on Grokepedia and not Wikipedia. Removing NSFW material gives him another reason.