▲ | skybrian 7 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Like most image generators, it didn’t pass the piano keyboard test. (Black keys are wrong.) https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | joombaga 7 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
What is the piano keyboard test? Your link requires granting AI Studio access to Google Drive, which I do not want to do. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | Workaccount2 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
The selling point of this model really seems to be it's consistency between generations rather than it's raw generating ability. for instance: https://aistudio.google.com/app/prompts/1gTG-D92MyzSKaKUeBu2... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | pbhjpbhj 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Are their models that have vector space that includes ideas, not just words/media but not entirely corporeal aspects? So when generating a video of someone playing a keyboard the model would incorporate the idea of repeating groups of 8 tones, which is a fixed ideational aspect which might not be strongly represented in words adjacent to "piano". It seems like models need help with knowing what should be static, or homomorphic, across or within images associated with the same word vectors and that words alone don't provide a strong enough basis [*1] for this. *1 - it's so hard to find non-conflicting words, obviously I don't mean basis as in basis vectors, though there is some weak analogy. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | mikepurvis 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Interesting! I feel like that's maybe similar to the business of being able to correctly generate images of text— it looks like the idea of a keyboard to a non-musician, but is immediately wrong to someone who is actually familiar with it at all. I wonder if the bot is forced to generate something new— certainly for a prompt like that it would be acceptable to just pick the first result off a google image search and be like "there, there's your picture of a piano keyboard". | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | vunderba 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Anything that is heavily periodic can definitely trip up image gen - that being I just used Flux Kontext T2I and got a got pretty close (disregard the hammers though since thats a right mess). Only towards the upper register did it start to make mistakes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | psbp 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Doesn't pass the analog clock test either. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | cubefox 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Like most image models, except GPT-4o, it also didn't pass the wooden Penrose triangle test. (It creates normal triangles.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | carimura 7 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
or my "hands with palms facing down" test.... no matter how hard I try it just can't get open hands, palms down. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | conception 7 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Failed my horizontal text test as well. |