| ▲ | vunderba 7 hours ago |
| Alright results are in! I've re-run all my editing based adherence related prompts through Nano Banana Pro. NB Pro managed to successfully pass SHRDLU, the M&M Van Halen test (as verified independently by Simon), and the Scorpio street test - all of which the original NB failed. Model results
1. Nano Banana Pro: 10 / 12
2. Seedream4: 9 / 12
3. Nano Banana: 7 / 12
4. Qwen Image Edit: 6 / 12
https://genai-showdown.specr.net/image-editingIf you just want to see how NB and NB Pro compare against each other: https://genai-showdown.specr.net/image-editing?models=nb,nbp |
|
| ▲ | tylervigen 2 hours ago | parent | next [-] |
| I think Nano banana pro’s answer to the giraffe edit is far superior to the Seedream response, but you passed Seedream and failed NB pro. Maybe that one is just not a good test? |
| |
| ▲ | strbean 4 minutes ago | parent | next [-] | | If you look closely, the NBP giraffe has a gaping hole in it's neck. | |
| ▲ | tziki an hour ago | parent | prev [-] | | I agree, it seems like Seedream has the neck at same length as Nano Banana but also made the giraffe crouch down, making a major modification to the overall picture. |
|
|
| ▲ | sosodev 5 hours ago | parent | prev | next [-] |
| I think Nano Banana Pro should have passed your giraffe test. It's not a great result but it is exactly what you asked for. It's no worse than Seedream's result imo. |
| |
| ▲ | vunderba 5 hours ago | parent | next [-] | | Yeah I think that's a fair critique. It kind of looks like a bad cut-and-replace job (if you zoom in you can even see part of the neck is missing). I might give it some more attempts to see if it can do a better job. I agree that Seedream could definitely be called out as a fail since it might just be a trick of perspective. | | |
| ▲ | sefrost 3 hours ago | parent [-] | | Have you ever considered a “partial pass”? Perhaps it would be an easy cop out of making a decision if you had to choose something outside of pass/fail. | | |
| ▲ | vunderba 2 hours ago | parent [-] | | That's not a bad suggestion. I thought about adding a numerical score but it felt like it was bit overwhelming at the time. Maybe I should revisit it though in the form of: Fail = 0 points
Partial = 0.5 points
Success = 1 point
There's definitely a couple of pictures where I feel like I'm at the optometrist and somehow failing an eye exam (1 or 2, A... or B). | | |
| ▲ | jofzar 39 minutes ago | parent [-] | | I agree with this, some of those are "passing" and others are really passing. Specially with how much better some of the new model is compared to old ones. I think the paws one is a good example where I think the new model got 100% while the other was more like 75% |
|
|
| |
| ▲ | aqme28 2 hours ago | parent | prev | next [-] | | I don’t understand at all why Seedream gets a pass there. The neck appears the same length but now it’s at a different angle. | |
| ▲ | jonplackett 3 hours ago | parent | prev | next [-] | | Yeah it’s better than the weirdness of seedream for sure. | |
| ▲ | kevlened 5 hours ago | parent | prev [-] | | I agree. From where I'm sitting, Seedream just bent the neck while Nano Banana Pro actually shortened the neck. |
|
|
| ▲ | rl3 42 minutes ago | parent | prev | next [-] |
| "Remove all the trash from the street and sidewalk. Replace the sleeping person on the ground with a green street bench. Change the parking meter into a planted tree." Three sentences that do a great job summing up modern big tech. The new model even manages to [digitally] remove all trash. |
|
| ▲ | humamf 5 hours ago | parent | prev | next [-] |
| The pisa tower test is really interesting. Many of this prompt have stricter criteria with implicit knowledge and some models impressively pass it. Yet for something as obvious as straightening a slanted object is hard even for latest models. |
| |
| ▲ | kridsdale3 4 hours ago | parent [-] | | I suspect there'd be no problem rotating a different object. But this tower is EXTREMELY represented in the training data. It's almost an immutable law of physics that Towers in Pisa are Leaning. | | |
| ▲ | gridspy 4 hours ago | parent [-] | | It's also a tower that has famously been deliberately un-straightend just enough to remain a tourist attraction while remaining stable. |
|
|
|
| ▲ | Nifty3929 4 hours ago | parent | prev | next [-] |
| Would you leave one of the originals in each test visible at all times (a control) so that I can see the final image(s) that I'm considering and the original image at the same time? I guess if you do that then maybe you don't need the cool sliders anymore? Anyway - thanks so much for all your hard work on this. A very interesting study! |
|
| ▲ | Wyverald 6 hours ago | parent | prev | next [-] |
| thanks, I love your website. Are you planning to do NB Pro for the text-to-image benchmark too? |
| |
| ▲ | vunderba 2 hours ago | parent | next [-] | | Outside the time frame of being able to edit my original reply, but I've finally re-run the Text-to-Image portion of the site through NB Pro. Results
gpt-image-1: 10 / 12
Nano Banana Pro: 9 / 12
Nano Banana: 8 / 12
It's worth mentioning that even though it only scored slightly better than the original NB, many of the images are significantly better looking.https://genai-showdown.specr.net?models=nb,nbp | | |
| ▲ | Wyverald 2 hours ago | parent [-] | | thanks for the update. One small note: for the d20 test, NB Pro had duplications of 13 and 17 too, not just 19. |
| |
| ▲ | vunderba 6 hours ago | parent | prev [-] | | Definitely! Even though NB's predominant use case seems to be editing, it's still producing surprisingly decent text-to-image results. Imagen4 currently still comes out ahead in terms of image fidelity, but I think NB Pro will close the gap even further. I'll try to have the generative comparisons for NB Pro up later this afternoon once I catch my breath. |
|
|
| ▲ | dyauspitr 2 hours ago | parent | prev [-] |
| Seedream generally looks like low quality outputs and it doesn’t seem like you’re assigning points for quality. This is only marginally helpful. |