Remix.run Logo
District5524 15 hours ago

I asked Sora to turn a random image of my friend and myself into Italian plumbers. Nothing more, just the two words "Italian plumbers". The created picture was not shown to me because it was in violation of OpenAI's content policy. I asked then just to turn the guys on the picture into plumbers, but I asked this in the Italian language. Without me asking for it, Sora put me in an overall and gave me a baseball cap, and my friend another baseball cap. If I asked Sora to put mustache on us, one of us received a red shirt as well, without being asked to. Starting with the same pic, if I asked to put one letter on the baseball caps each - guess, the letters chosen were M and L. These extra guardrails are not really useful with such a strong, built-in bias towards copyright infringement of these image creation tools. Should it mean that with time, Dutch pictures will have to include tulips, Italian plumbers will have to have a uniform with baseball caps with L and M, etc. just not to confuse AI tools?

Cthulhu_ 11 hours ago | parent | next [-]

You (and the article, etc) show what a lot of the "work" in AI is going into at the moment - creating guardrails against creating something that might get them in trouble, and / or customizing weights and prompts under water to generate stuff that isn't the obvious. I'm reminded of when Google's image generator came up and this customization bit them in the ass when they generated a black pope or asian vikings. AI tools don't do what you wish they did, they do what you tell them and what they are taught, and if 99% of their learning set associates Mario with prompts for Italian plumbers, that's what you'll get.

A possible (probably already exists) business is setting up truly balanced learning sets, that is, thousands of unique images that match the idea of an italian plumber, with maybe 1% of Mario. But that won't be nearly as big a learning set as the whole internet is, nor will it be cheap to build it compared to just scraping the internet.

rurp 3 hours ago | parent | next [-]

>> they do what you tell them and what they are taught, and if 99% of their learning set associates Mario with prompts for Italian plumbers, that's what you'll get.

I thought that a lot of the issues were the opposite of this, where Google put their thumb on the scale to go against what the prompt asked. Like when someone would ask for a historically accurate picture of a US senator from the 1800s and repeatedly get women and non-white men. The training set for that prompt has to be overwhelmingly white men so I don't think it was just a matter of following the training data.

feoren 9 hours ago | parent | prev [-]

I remember all the hullaballoo about Asian Vikings and the like. It was so preposterous that Vikings would ever be Asian that it must be ultra-woke DEI mind-worms being forced onto AI! But of course, as far as the AI's concerned, it is even more preposterous that an Italian plumber would not be wearing red or green overalls with a mustache and a lettered baseball cap. I don't see any way you can get the AI to recognize that Vikings "should" be white people and not also think that Italian plumbers "should" look like that. Are they allowed to recombine their training data or must they strictly adhere to only what they've seen?

Of course the irony is that if the people who get offended whenever they see images of non-white people asked for a picture of "Vikings being attacked by Godzilla" , they'd get worked up if any of the Vikings in the picture were Asian (how unrealistic!). It's a made-up universe! The image contains a damn (Asian) Kaiju in it, and everyone is supposed to be pissed because the Vikings are unrealistic!?

GuB-42 8 hours ago | parent | next [-]

That's what you get when you expect AIs to be like humans and be able to reason. We would be pissed if a human artist did that, so we are pissed when AIs do it.

A human, even one whose only experience of an Italian plumber is Mario will be able to draw an Italian plumber who is not Mario. That's because he knows that Mario is just a video game character and doesn't even do much plumbing. He knows however how an actual non-Italian plumber looks like, and that a guy doing plumbing work in Italy is more likely to look like a regular Italian guy equipped like a non-Italian plumber than to a video game character.

And if asked to draw a Viking, he knows that Vikings are people originating from Scandinavia, so they can't be Asian by definition, even in an Asian context. A human artist can adjust things to the unrealistic setting, but unless presented with a really good reason, will not change the core traits of what makes a Viking a Viking.

But it requires reasoning. Which current image generating AIs don't have.

feoren 6 hours ago | parent [-]

> We would be pissed if a human artist did that

No, I would not be pissed if a human artist drew an Asian Viking. Do you get pissed when a human artist draws a white Jesus? Why are we justifying internet outrage over an Asian Viking when people have been drawing this middle-eastern Jew as white for centuries?

> A human artist can adjust things to the unrealistic setting, but unless presented with a really good reason, will not change the core traits of what makes a Viking a Viking.

If you asked Matt Stone and Trey Parker to draw a Viking, are you sure it would contain the "core traits of what makes a Viking a Viking?" What if you asked Picasso to draw a Viking? The Vikings in The Simpsons would be yellow, and nobody would complain. Would you be offended if you asked Hokusai to draw a Viking and it came out looking Asian? Vikings didn't even have those stupid horned helmets that everyone draws them with! Is their dumb, historically inaccurate horned helmet a core part of what makes a Viking a Viking? What the hell are we even talking about? It's crystal clear that all of these "historical accuracy" drums are only ever beaten when some white person is offended that non-white people exist. Otherwise, nobody gives a shit about historical accuracy. There's a fucking Kaiju in the image!

Like any artist, Gemini had a particular style. That style happened to be a multi-cultural one, and what we learned is that a multi-culture style is absolutely enraging to people unless it results in more Whiteness.

Consider elves instead of Vikings. People would also be offended if an AI drew elves as black people with pointy ears. There's no "a human artist should know that elves have to be white" bullshit defense there. There's no historical accuracy bullshit. There's only racism.

jerf 6 hours ago | parent | prev | next [-]

The AIs were not "naturally" generating images of Asian Vikings. It was established to my satisfaction, even if the companies never admitted it (I don't recall it happening but I may have missed it), that it was actually the prompt being rather hamhandedly edited on the way to the image generator, for the clear purpose of "correcting" the opinions and attitudes of those issuing the prompts through social engineering.

Unsurprisingly, people don't like being so nakedly herded in their opinions. When the "nudges" become "shoves" people object.

feoren 6 hours ago | parent [-]

My point is that there is no prompt engineering that could keep Vikings white without also keeping Italian plumbers looking like Mario. Unless you singled out Mario, but there are too many examples to do that with. The AI does not put Mario in a different category than a Viking. You have to try to get the AI to avoid using exact literal imagery, to make sure it's mixing things up a bit, varying facial features and clothing styles when it shows people ... you know, being "diverse". How are we supposed to get an Italian plumber in anything other than red overalls without getting a Viking wearing a sari?

The Gemini prompt was something like "make sure any images of people show a diverse range of humans", or something. Yes, it was totally ham-handed, but that's not what people were pissed about. It's also ham-handed that we can't generate a nipple, or a swear word, or violence. Why does "make sure images do not contain excessive violence" not piss people off? The Vikings were fucking brutal. It would be very historically accurate to show them raping women and cutting people's limbs off. Are we all supposed to be pissed that AI does not generate that image? It's just as ham-handed as "make sure humans are diverse". No, it was not the ham-handedness that enraged people. It was not the historical inaccuracy. It was the word "diverse".

feoren 6 hours ago | parent | prev [-]

I'm assuming the downvoters are the ones who get offended at the sight of an Asian Viking, so let me ask you this:

In a work of fiction -- which you're automatically asking for when you ask an AI to generate an image -- in a work of fiction, would you be offended if you saw a white Ninja? A white Samurai? A white Middle-Eastern Jew born in Roman times? Would there have been internet outrage over pictures of white Samurai? We all know the answer: no, of course not. So why is an Asian Viking offensive when a white Samurai is not? Why are we supposed to get angry about an Asian Viking, but a white Jesus is just A-OK? What could the difference possibly be? Anyone?

noworriesnate 4 hours ago | parent [-]

People get offended about these all the time, it’s called cultural appropriation[1]. It’s not just whites who have culture they dislike being an appropriated though whites do get offended by this as well, like any people with a rich cultural tapestry.

[1] https://en.wikipedia.org/wiki/Cultural_appropriation

barbazoo 7 hours ago | parent | prev | next [-]

I feel like the golden and fun age of GenAI is already over.

echelon 9 hours ago | parent | prev | next [-]

OpenAI will eventually have competition for GPT 4o image generation.

They'll eventually have open source competition too. And then none of this will matter.

OmniGen is a good start, just woefully undertrained.

The VAR paper is open, from ByteDance, and supposedly the architecture this is based on.

Black Forest Labs isn't going to sit on their laurels. Their entire product offering just became worthless and lost traction. They're going to have to answer this.

I'd put $50 on ByteDance releases an open source version of this in three months.

jxramos 7 hours ago | parent | prev | next [-]

lol, this interaction may possibly become known as "grooming the AI"

artursapek 7 hours ago | parent | prev | next [-]

that’s hilarious

adr1an 6 hours ago | parent | prev [-]

Another example of prime reasoning capabilities /s