| ▲ | IshKebab 4 days ago |
| Funny how it really wants people to dance. Even the guy sitting down for an interview just starts dancing sitting down. |
|
| ▲ | jonas21 4 days ago | parent | next [-] |
| Presumably they're dancing because it's in the prompt. You could change the prompt to have them do something else (but that would be less fun!) |
| |
| ▲ | IshKebab 4 days ago | parent [-] | | I'm no expert but are you sure there is a prompt? | | |
| ▲ | dragonwriter 4 days ago | parent [-] | | Yes, while the page here does not directly mention the prompts, the linked paper does, and the linked code repo shows that prompts are used as well. | | |
| ▲ | vunderba 3 days ago | parent | next [-] | | 100%. I don't think I've ever even come across an I2V model that didn't require at least a positive prompt. Some people get around it by integrating a vision LLM into their ComfyUI workflows however. | |
| ▲ | IshKebab 3 days ago | parent | prev [-] | | Ah yeah you're right - they seem to just really like giving dancing prompts. I guess they work well due to the training set. |
|
|
|
|
| ▲ | Jaxkr 4 days ago | parent | prev | next [-] |
| Massive open TikTok training set lots of video researchers use |
|
| ▲ | bravura 4 days ago | parent | prev [-] |
| It's a peculiar and fascinating observation you make. With static images, we always look for eyes. With video, we always look for dancing. |