Remix clone Hacker News

Idk, the models generating what are basically 1:1 copies of the training data from pretty generic descriptions feels like a severe case of overfitting to me. What use is a generational model that just regurgitates the input?

I feel like the less advanced generations, maybe even because of their limitations in terms of size, were better at coming up with something that at least feels new.

In the end, other than for copyright-washing, why wouldn't I just use the original movie still/photo in the first place?

▲

jeroenhd 17 hours ago | parent | next [-]

People like what they already know. When they prompt something and get a realistic looking Indiana Jones, they're probably happy about it.

To me, this article is further proof that LLMs are a form of lossy storage. People attribute special quality to the loss (the image isn't wrong, it's just got different "features" that got inserted) but at this point there's not a lot distinguishing a seed+prompt file+model from a lossy archive of media, be it text or images, and in the future likely video as well.

The craziest thing is that AI seems to have gathered some kind of special status that earlier forms of digital reproduction didn't have (even though those 64kbps MP3s from napster were far from perfect reproductions), probably because now it's done by large corporations rather than individuals.

If we're accepting AI-washing of copyright, we might as well accept pirated movies, as those are re-encoded from original high-resolution originals as well.

	▲	AlienRobot 9 hours ago \| parent \| next [-]
		The year is 2030. A new MCU movie is released, its 60 second trailer posted on Youtube, but I don't feel like watching the movie because I got bored after Endgame. Youtube has very strict anti-scraping techniques now, so I use deep-scrapper to generate the whole trailer from the thumbnail and title. I use deep-pirate to generate the whole 3 hour movie from the trailer. I use deep-watcher to summarize the whole movie in a 60 second video. I watch the video. It doesn't make any sense. I check the Youtube trailer. It's the same video.
	▲	balamatom 14 hours ago \| parent \| prev [-]
		Probably the majority of people in the world already "accept pirated movies". It's just that, as ever, nobody asks people what they actually want. Much easier to tell them what to want, anyway. To a viewer, a human-made work and an AI-generated one both amount to a series of stimuli that someone else made and you have no control over; and when people pay to see a movie, generally they don't do it with the intent to finance the movie company to make more movies -- they do it because they're offered the option to spend a couple hours watching something enjoyable. Who cares where it comes from -- if it reached us, it must be good, right? The "special status" you speak of is due to AI's constrained ability to recombine familiar elements in novel ways. 64k MP3 artifacts aren't interesting to listen to; while a high-novelty experience such as learning a new culture or a new discipline isn't accessible (and also comes with expectations that passive consumption doesn't have.) Either way, I wish the world gave people more interesting things to do with their brains than make a money, watch a movies, or some mix of the two with more steps. (But there isn't much of that left -- hence the concept of a "personal life" as reduced to breaking one's own and others' cognitive functioning then spending lifetimes routing around the damage. Positively fascinating /s)

▲

yk 19 hours ago | parent | prev | next [-]

Tried Flux.dev with the same prompts [0] and it seems actually to be a GPT problem. Could be that in GPT the text encoder understands the prompt better and just generates the implied IP, or could be that a diffusion model is just inherently less prone to overfitting than a multimodal transformer model.

[0] https://imgur.com/a/wqrBGRF Image captions are the impled IP, I copied the prompts from the blog post.

	▲	jsemrau 19 hours ago \| parent [-]
		DALL-E 3 already uses a model that trained on synthetic data that take the prompt and augments it. This might lead to the overfitting. It could also be, and might be the simpler explanation, that its just looks up the right file from a RAG.

▲

vjerancrnjak 19 hours ago | parent | prev | next [-]

If it overfits on the whole internet then it’s like a search engine that returns really relevant results with some lossy side effect.

Recent benchmark on unseen 2025 Math Olympiad shows none of the models can problem solve . They all accidentally or on purpose had prior solutions in the training set.

▲

jks 19 hours ago | parent [-]

You probably mean the USAMO 2025 paper. They updated their comparison with Gemini 2.5 Pro, which did get a nontrivial score. That Gemini version was released five days after USAMO, so while it's not entirely impossible for the data to be in its training set, it would seem kind of unlikely.

https://x.com/mbalunovic/status/1907436704790651166

	▲	MatthiasPortzel 12 hours ago \| parent \| next [-]
		The claim is that these models are training on data which include the problems and explanations. The fact that the first model trained after the public release of the questions (and crowdsourced answers) performs best is not a counter example, but is expected and supported by the claim.
	▲	jsemrau 19 hours ago \| parent \| prev \| next [-]
		The same timing is actually suspicious. And it would not be the first time something like this happened.
	▲	iamacyborg 18 hours ago \| parent \| prev [-]
		I was noodling with Gemini 2.5 Pro a couple days ago and it was convinced Donald Trump didn’t win the 2024 election and that he conceded to Kamala Harris so I’m not entirely sure how much weight I’d put behind it.

▲

gertlex a day ago | parent | prev | next [-]

What if the word "generic" were added to a lot of these image prompts? "generic image of an intergalactic bounty hunter from space" etc.

Certainly there's an aspect of people using the chat interface like they use google: describe xyz to try to surface the name of a movie. Just in this case, we're doing the (less common?) query of: find me the picture I can vaguely describe; but it's a query to a image /generating/ service, not an image search service.

▲

squeaky-clean a day ago | parent [-]

Generic doesn't help. I was using the new image generator to try and make images for my Mutants and Masterminds game (it's basically D&D with superheroes instead of high fantasy), and it refuses to make most things citing that they are too close to existing IP, or that the ideas are dangerous.

So I asked it to make 4 random and generic superheroes. It created Batman, Supergirl, Green Lantern, and Wonder Woman. Then at about 90% finished it deleted the image and said I was violating copyright.

https://imgur.com/a/eG6kmqu

I doubt the model you interact with actually knows why the babysitter model rejects images, but it claims to know why and leads to some funny responses. Here is it's response to me asking for a superhero with a dark bodysuit, a purple cape, a mouse logo on their chest, and a spooky mouse mask on their face.

> I couldn't generate the image you requested because the prompt involved content that may violate policy regarding realistic human-animal hybrid masks in a serious context.

▲

gertlex a day ago | parent [-]

Yeah... so much for that hope on my end! Thanks for testing.

(hard to formulate why I was too lazy to test myself :) )

▲

genewitch 16 hours ago | parent [-]

Because it's depressing how much money was burned for this sort of result? That makes me pretty lazy.

	▲	gertlex 10 hours ago \| parent [-]
		Probably more: "I'll save my energy for interacting with AI for something more useful to me"

▲

tshaddox a day ago | parent | prev | next [-]

Idk, a couple of the examples might be generic enough that you wouldn't expect a very specific movie character. But most of the prompts make it extremely clear which movie character you would expect to see, and I would argue that the chat bot is working as expected by providing that.

▲

grotorea a day ago | parent [-]

Even if I'm thinking of an Indiana Jones-like character doesn't mean I want literally Indiana Jones. If I wanted Indiana Jones I could just grab a scene from the movie.

	▲	infthi 17 hours ago \| parent \| next [-]
		if someone gets Indiana on the suspiciously detailed request the author provided and it appears they wanted something else, they can clarify that to the chat bot, e.g. by copying this your comment. I have a strong suspicion that many human artists would behave in a way the chat bot did (unless they start asking clarifying questions. Which chatbots should learn to do as well)
	▲	m000 15 hours ago \| parent \| prev [-]
		Good luck grabbing that Ghibli movie scene, where Indiana Jones arm-wrestles Lara Croft in a dive-bar, with Brian Flanagan serving a cocktail to Allan Quatermain in the background.

▲

RataNova 18 hours ago | parent | prev | next [-]

Yeah, I've been feeling the same. When a model spits out something that looks exactly like a frame from a movie just because I typed a generic prompt, it stops feeling like “generative” AI and more like "copy-paste but with vibes."

	▲	FeepingCreature 17 hours ago \| parent [-]
		To my knowledge this happens when that single frame is overrepresented in its training data. For instance, variations of the same movie poster or screenshot may appear hundreds of times. Then the AI concludes that this is just a unique human cultural artifact, like the Mona Lisa (which I would expect many human artists could also reproduce from memory).

▲

fennecfoxy 10 hours ago | parent | prev | next [-]

Probably an over-representation in the training data really so it's causing overfitting. Because using training data in amounts right from the Internet it's going to be opinionated on human culture (Bart Simpson is popular so there are lots of images of him, Ori is less well known so there are fewer images). Ideally it should be training 1:1 for everything but that would involve _so_ much work pruning the training data to have a roughly equal effect between categories.

▲

stevage a day ago | parent | prev | next [-]

Why? For fan content, using the original characters in new situations, mashups, new styles etc.

▲

Lerc a day ago | parent | prev | next [-]

I'm not sure if this is a problem with overfitting. I'm ok with the model knowing what Indiana Jones or the Predator looks like with well remembered details, it just seems that it's generating images from that knowledge in cases where that isn't appropriate.

I wonder if it's a fine tuning issue where people have overly provided archetypes of the thing that they were training towards. That would be the fastest way for the model to learn the idea but it may also mean the model has implicitly learned to provide not just an instance of a thing but a known archetype of a thing. I'm guessing in most RLHF tests archetypes (regardless of IP status) score quite highly.

▲

masswerk a day ago | parent | next [-]

What I'm kind of concerned about is that these images will persist and will be reinforced by positive feedback. Meaning, an adventurous archeologist will be the same very image, forever. We're entering the epitome of dogmatic ages. (And it will be the same corporate images and narratives, over and over again.)

▲

duskwuff 20 hours ago | parent | next [-]

And it's worth considering that this issue isn't unique to image generation, either.

▲

masswerk 14 hours ago | parent | next [-]

E.g., I think, there are now entire generations, who never played anything as a child that wasn't tied in with corporate IP in one way or the other.

▲

Lerc 20 hours ago | parent | prev [-]

Santa didn't always wear red.

	▲	52-6F-62 8 hours ago \| parent [-]
		Granted, but not the best example, red and green are the emblematic colours elves wore in northern european cultures. Santa is somewhat syncretic with Robert Goodfellow or Robin Redbreast, Puck, Puca, etc etc. it wasn’t really a cola invention.

▲

baq 17 hours ago | parent | prev [-]

Welcome to the great age of slop feedback loops.

▲

vkou 20 hours ago | parent | prev [-]

> I'm ok with the model knowing what Indiana Jones or the Predator looks like with well remembered details,

ClosedAI doesn't seem to be OK with it, because they are explicitly censoring characters of more popular IPs. Presumably as a fig leaf against accusations of theft.

▲

red75prime 15 hours ago | parent | next [-]

If you define feeding of copyrighted material into a non-human learning machine as theft, then sure. Anything that mitigates legal consequences will be a fig leaf.

The question is "should we define it as such?"

▲

reginald78 10 hours ago | parent | next [-]

The fact that they have guardrails to try and prevent it means OpenAI themselves thinks it is at least shady or outright illegal in someway. Otherwise why bother?

▲

vkou 14 hours ago | parent | prev [-]

If a graphics design company was using human artists to do the same thing that OpenAI is, they'd be sued out of existence.

But because a computer, and not a human does it, they get to launder their responsibility.

▲

red75prime 13 hours ago | parent [-]

Doing what? Telling their artists to create what they want regardless of copyright and then filtering the output?

For humans it doesn't make sense because we have generation and filtering in a single package.

	▲	vkou 6 hours ago \| parent [-]
		In this case the output wasn't filtered. They are just producing images of Harrison Ford, and I don't think they are allowed to use his likeness in that way.

▲

Lerc 8 hours ago | parent | prev [-]

There is a difference between knowing what something looks like and generating an image of that thing.

▲

fermisea a day ago | parent | prev | next [-]

Why? Replace the context and not having that property is now called a hallucination.

Overall the model is tra

▲

o11c a day ago | parent | prev | next [-]

It's not a single image though. Stitching 3 or so input images together clearly makes copyright laundering legal.

	▲	otabdeveloper4 19 hours ago \| parent [-]
		No it doesn't. Commercial intent (usually) makes it illegal, not the fact of copying. So the criminal party here would be OpenAI, since they are selling access to a service that generates copyright-infringing images.

▲

ToucanLoucan a day ago | parent | prev | next [-]

> I feel like the less advanced generations, maybe even because of their limitations in terms of size, were better at coming up with something that at least feels new.

Ironically that's probably because the errors and flaws in those generations at least made them different from what they were attempting to rip off.

▲

sejje 7 hours ago | parent [-]

It hallucinated, but we like that now!

	▲	ToucanLoucan 6 hours ago \| parent [-]
		I didn't say they were good, I said they were different lol

▲

ramraj07 20 hours ago | parent | prev [-]

So I train a model to say y=2, and then I ask the model to guess the value of y and it says 2, and you call that overfitting?

Overfitting is if you didn't exactly describe Indiana Jones and then it still gave Indiana Jones.

▲

MgB2 20 hours ago | parent | next [-]

The prompt didn't exactly describe Indiana Jones though. It left a lot of freedom for the model to make the "archeologist" e.g. female, Asian, put them in a different time period, have them wear a different kind of hat etc.

It didn't though, it just spat out what is basically a 1:1 copy of some Indiana Jones promo shoot. No where did the prompt ask for it to look like Harrison Ford.

▲

fennecfoxy 10 hours ago | parent | next [-]

But the concentrations of training data because of human culture/popularity of characters/objects means that if I go and give a random person the same description of a character that the AI got and ask "who am I talking about, what do they look like?" there's a very high likelihood that they'll answer "Indiana Jones".

▲

fluidcruft 19 hours ago | parent | prev | next [-]

But... the prompt neither forbade Indiana Jones nor did it describe something that excluded Indiana Jones.

If we were playing Charades, just about anyone would have guessed you were describing Indiana Jones.

If you gave a street artist the same prompt, you'd probably get something similar unless you specified something like "... but something different than Indiana Jones".

▲

9dev 18 hours ago | parent | next [-]

And… that is called overfitting. If you show the model values for y, but they are 2 in 99% of all cases, it’s likely going to yield 2 when asked about the value of y, even if the prompt didn’t specify or forbid 2 specifically.

▲

IanCal 14 hours ago | parent | next [-]

> If you show the model values for y, but they are 2 in 99% of all cases, it’s likely going to yield 2 when asked about the value of y

That's not overfitting. That's either just correct or underfitting (if we say it's never returning anything but 2)!

Overfitting is where the model matches the training data too closely and has inferred a complex relationship using too many variables where there is really just noise.

▲

FeepingCreature 17 hours ago | parent | prev [-]

I would argue this is just fitting.

	▲	fluidcruft 8 hours ago \| parent [-]
		If you take the perspective of all the possible responses to the request, then it is overfit because it only returns a non-generalized response. But if you look at it from the perspective that there is only one example to learn, from it is maybe not over it.

▲

darkwater 17 hours ago | parent | prev [-]

The nice thing about humans is that not every single human being read almost every content present on the Internet. So yeah, a certain group of people would draw or think of Indiana Jones with that prompt, but not everyone. Maybe we will have different models with different trainings/settings that permits this kind of freedom, although I doubt it will be the commercial ones.

▲

dash2 17 hours ago | parent [-]

I mean, did anyone here read the prompt and not think “Indiana Jones”?

▲

sethammons 14 hours ago | parent | next [-]

I didn't think it. I imagined a cartoonish chubby character in typical tan safari gear with a like-colored round explorer hat and swinging a whip like a lion tamer. He is mustachioed, light skin, and bespectacled. And I am well familiar with Dr. Jones.

▲

darkwater 15 hours ago | parent | prev [-]

Is HN the whole world? Isn't an AI model supposed to be global, since it has ingested the whole Internet?

How can you express, in term of AI training, ignoring the existence of something that's widely present in your training data set? if you ask the same question to a 18yo girl in rural Thailand, would she draw Harrison Ford as Indiana Jones? Maybe not. Or maybe she would.

But IMO an AI model must be able to provide a more generic (unbiased?) answer when the prompt wasn't specific enough.

	▲	lupusreal 14 hours ago \| parent [-]
		Why should the AI be made to emulate a person naive to extant human society, tropes and customs? That would only make it harder for most people to use. Maybe it would have some point if you are targetting users in a substantially different social context. In the case, you would design the model to be familiar with their tropes instead. So when they describe a character iconic in their culture, by a few distinguishing characteristics, it would produce that character for them. That's no different at all.

▲

crooked-v 7 hours ago | parent | prev [-]

Or even just 'obvious Indiana Jones knockoff who isn't literally Harrison Ford'. Comics do that kind of thing constantly for various obviously inspired but legally distinct characters.

▲

whywhywhywhy 15 hours ago | parent | prev [-]

What would most humans draw when you describe such a well known character by their iconic elements. Think if you deviated and acted a pedant about it people would think you're just trying to prove a point or being obnoxious.