Still waiting for an explicit answer on understand how 'safety' is truly distinguishable from 'censorship' or 'political correctness'

Of course saying to someone to go kill himslef is a prety sure 'no-no' but so many things are up to interpretation.

I VERY LARGELY prefer an AI like grok that doesn't pretend and let the onus of interpretation to the user rather than a bunch of anonymous "researchers" that may be equally biased, at the extreme, may tell you that America's founding father were black women

▲

floatrock 44 minutes ago | parent | next [-]

Was there actually a case of a model saying "America's founding father were black women", or is that just Elon fingering your amygdala with a ridiculous hypothetical that exists nowhere other than Elon's mind in order to justify Elon's personal bias tweaks when he doesn't like the wisdom-of-the-crowds answer his tools initially give?

▲

bumby 40 minutes ago | parent [-]

There were well-publicized cases of Gemini producing more diverse founding fathers images, female popes, etc.

Also, snarky tone is against the HN guidelines.

▲

floatrock 24 minutes ago | parent [-]

Sorry, let me give a specific citation of Elon injecting his personal bias into the output of his tools: https://www.theguardian.com/technology/2025/jul/14/elon-musk...

As for the "Elon fingering your amygdala with a ridiculous hypothetical" snark, well, I think the HN crowd in particular understands how the culture wars are just theater to push through billionaires' personal self-centered interests at the expense of everyone else. If that level of pull-aside-the-curtains pragmatism is really "snark against HN guidelines", well, I think 3/4 of the comments on the site would be flagged and deleted.

	▲	bumby 15 minutes ago \| parent [-]
		Your question was “Was there actually a case of a model saying "America's founding father were black women" Whether someone else is injecting different bias is whataboutism. So it seems you are trying to make a different point, but not being clear about it. And your “I think the HN crowd understands…” point is just a “no true Scotsman” fallacy to veil an argument that goes against guidelines. Related to the broader topic, there is a role for self-policing if we don’t want the site to be a cesspool of rage bait.

▲

wattsy2025 3 hours ago | parent | prev | next [-]

The most important part of AI safety is AI alignment: making sure AI does what we want. It's very hard because even if AI isn't trying to deceive you it can have bad outcomes by executing your request to the letter. The classical example is tasking an AI to make paperclips, training the AI with a reward for making more paperclips. Then the AI makes the most paperclips possible by strip mining the Earth and killing anything in its way.

Sometimes you see this AI alignment problem in action. I once asked an older model to fix the tests and it eventually gave up and just deleted them

▲

chasd00 38 minutes ago | parent | prev | next [-]

> Still waiting for an explicit answer on understand how 'safety' is truly distinguishable from 'censorship' or 'political correctness'

i've said this many times but the concept of ai "safety" is really brand safety. What Anthropic is saying is they're willing to risk some bad press to bypass the additional training and find tuning to ensure their models do not output something people may find outrageous.

▲

miltonlost 14 minutes ago | parent | prev | next [-]

david guetta, if that really is you, stick to music rather than using Nazi man's propaganda machine

▲

gehwartzen 4 hours ago | parent | prev | next [-]

Well we teach kids not to yell “Fire!” In a crowded theatre or “N***!“ at their neighbor. We also teach our industrial machines to distinguish between fingers and bolts, our cars to not say “make a left turn now” when on a bridge, etc

▲

rudhdb773b 3 hours ago | parent [-]

The critical point is who the "we" is.

Is "we" the parents teaching their children their own unique values, or is the "we" a government or corporation forcing one set of values on all children.

Why not encourage the users of AI to use a Safety.md (populated with some reasonable but optional defaults)?

▲

dminik 3 hours ago | parent [-]

There's nothing a meaningless document can do when the AI is not aligned in the first place.

	▲	lupire an hour ago \| parent [-]
		"alignment" is the computer version for (philosophical not medical) "consciousness", a totally subjective, immeasurable concept.

▲

SlinkyOnStairs 3 hours ago | parent | prev [-]

> I VERY LARGELY prefer an AI like grok that doesn't pretend and let the onus of interpretation to the user rather than a bunch of anonymous "researchers" that may be equally biased, at the extreme, may tell you that America's founding father were black women

Setting aside for a moment that Grok is manipulated and biased to a hilarious extent. ("Elon is world champion at everything, including drinking piss")

There is no such thing as "unbiased". There will always be bias in these systems, whether picked up from the training data, or the choices made by the AI's developers/researchers, even if the latter doesn't "intend" to add any bias.

Ignoring this problem doesn't magically create a bias-free AI that "speaks the truth about the founding fathers". The bias in the training data, the implicit unconcious bias in the design decisions, that didn't come out of thin air. It's just somebody else's bias.

All the existing texts on the founding fathers are filled with 250 years of bias, propaganda, and agenda pushing from all sorts of authors.

There is no way to have no bias, no propaganda, no "agenda pushing" in the AI. The only thing that can be done is to acknowledge this problem, and try to steer the system to a neutral position. That will be "agenda pushing" of one's own, but that's the reality of all history and all historians since Herodotus. You just have to be honest about it.

And you will observe that current AI companies are excessively lazy about this. They do not put in the work, but instead slap on a prompt begging the system to "pls be diverse" and try to call it a day. This does not work.

> Of course saying to someone to go kill himslef is a prety sure 'no-no' but so many things are up to interpretation.

Bear in mind that the context of Anthropic's pivot here are the Pentagon's dollars.

This isn't just about "anti-woke AI", it's about killbots.

Sure, Hegseth wants his robots to not do thoughtcrime about, say, trans people or the role of women in the military.

But above all he wants to do a lot of murder.

Antrophic dropping their position of "We shouldn't turn this technology we can barely control into murder machines" because they're running out of money is damnable.

	▲	lupire an hour ago \| parent [-]
		You understood the issue so well but still made the mistake you identified, by claiming that "neutral" exists. "Neutral" is a synonym for "bias toward status quo"