Remix.run Logo
mortenjorck 10 hours ago

This is the first image model I’ve used that passed my piano test. It actually generated an image of a keyboard with the proper pattern of black keys repeated per octave – every other model I’ve tried this with since the first Dall-E has struggled to render more than a single octave, usually clumping groups of two black keys or grouping them four at a time. Very impressive grasp of recursive patterns.

crat3r 9 hours ago | parent | next [-]

If you ask it for anything outside of the standard 88 key set it falls short. For instance

"Generate a piano, but have the left most key start at middle C, and the notes continue in the standard order up (D, E, F, G, ...) to the right most key"

The above prompt will be wrong, seemingly every time. The model has no understanding of the keys or where they belong, and it is not able to intuit creating something within the actual confines of how piano notes are patterned.

"Generate a piano but color every other D key red"

This also wrong, every time, with seemingly random keys being colored.

I would imagine that a keyboard is difficult to render (to some extent) but I also don't think its particularly interesting since it is a fully standardized object with millions of pictures from all angles in existence to learn from right?

vunderba 9 hours ago | parent [-]

Yep - one of my goto bench marks is a "historical piano" - meaning the naturals are black and the sharps/flats are white.

https://imgur.com/a/SZbzsYv

vunderba 10 hours ago | parent | prev [-]

Periodic motion (groups of repeating patterns) always tend to degrade at some point. Maintaining coherence over 88 keys is impressive.