Remix.run Logo
Jordan-117 7 hours ago

To me, it feels similarly impossible/spooky to how image models work.

Consider a model like SDXL:

- each image is 512x512, plenty of detail

- max prompt length is 77 tokens, or a solid paragraph

- each image has a seed value between 0 and 9,999,999, with each seed giving a completely different take on the prompt

I can't begin to calculate the upper limit on the number of possible human-readable prompts that can fit in 77 tokens, but multiply even an (extremely conservative) estimate of a million possible prompts by 10 million seeds and it's clear that this model "contains", at minimum, literally tens of trillions of possible meaningful images -- all in a model file that's under 7 GB.

I suspect it works similarly to the biological side -- evolutionary pressure encoding complex patterns into hyper-efficient "programs" that aren't easily interpretable, but eerily effective despite their compact size.