| ▲ | Jordan-117 7 hours ago | |
To me, it feels similarly impossible/spooky to how image models work. Consider a model like SDXL: - each image is 512x512, plenty of detail - max prompt length is 77 tokens, or a solid paragraph - each image has a seed value between 0 and 9,999,999, with each seed giving a completely different take on the prompt I can't begin to calculate the upper limit on the number of possible human-readable prompts that can fit in 77 tokens, but multiply even an (extremely conservative) estimate of a million possible prompts by 10 million seeds and it's clear that this model "contains", at minimum, literally tens of trillions of possible meaningful images -- all in a model file that's under 7 GB. I suspect it works similarly to the biological side -- evolutionary pressure encoding complex patterns into hyper-efficient "programs" that aren't easily interpretable, but eerily effective despite their compact size. | ||