Alright results are in! I've re-run all my editing based adherence related prompts through Nano Banana Pro. NB Pro managed to successfully pass SHRDLU, the M&M Van Halen test (as verified independently by Simon), and the Scorpio street test - all of which the original NB failed.

  Model results
  1. Nano Banana Pro: 10 / 12
  2. Seedream4: 9 / 12
  3. Nano Banana: 7 / 12
  4. Qwen Image Edit: 6 / 12

https://genai-showdown.specr.net/image-editing

If you just want to see how NB and NB Pro compare against each other:

https://genai-showdown.specr.net/image-editing?models=nb,nbp

▲ tylervigen 2 hours ago | parent | next [-]

I think Nano banana pro’s answer to the giraffe edit is far superior to the Seedream response, but you passed Seedream and failed NB pro.

Maybe that one is just not a good test?

	▲	strbean 4 minutes ago \| parent \| next [-]
		If you look closely, the NBP giraffe has a gaping hole in it's neck.
	▲	tziki an hour ago \| parent \| prev [-]
		I agree, it seems like Seedream has the neck at same length as Nano Banana but also made the giraffe crouch down, making a major modification to the overall picture.

▲ sosodev 5 hours ago | parent | prev | next [-]

I think Nano Banana Pro should have passed your giraffe test. It's not a great result but it is exactly what you asked for. It's no worse than Seedream's result imo.

▲ vunderba 5 hours ago | parent | next [-]

Yeah I think that's a fair critique. It kind of looks like a bad cut-and-replace job (if you zoom in you can even see part of the neck is missing). I might give it some more attempts to see if it can do a better job.

I agree that Seedream could definitely be called out as a fail since it might just be a trick of perspective.

▲ sefrost 3 hours ago | parent [-]

Have you ever considered a “partial pass”?

Perhaps it would be an easy cop out of making a decision if you had to choose something outside of pass/fail.

▲ vunderba 2 hours ago | parent [-]

That's not a bad suggestion. I thought about adding a numerical score but it felt like it was bit overwhelming at the time. Maybe I should revisit it though in the form of:

  Fail = 0 points
  Partial = 0.5 points
  Success = 1 point

There's definitely a couple of pictures where I feel like I'm at the optometrist and somehow failing an eye exam (1 or 2, A... or B).

	▲	jofzar 39 minutes ago \| parent [-]
		I agree with this, some of those are "passing" and others are really passing. Specially with how much better some of the new model is compared to old ones. I think the paws one is a good example where I think the new model got 100% while the other was more like 75%

▲ aqme28 2 hours ago | parent | prev | next [-]

I don’t understand at all why Seedream gets a pass there. The neck appears the same length but now it’s at a different angle.

▲ jonplackett 3 hours ago | parent | prev | next [-]

Yeah it’s better than the weirdness of seedream for sure.

▲ kevlened 5 hours ago | parent | prev [-]

I agree. From where I'm sitting, Seedream just bent the neck while Nano Banana Pro actually shortened the neck.

▲ rl3 42 minutes ago | parent | prev | next [-]

"Remove all the trash from the street and sidewalk. Replace the sleeping person on the ground with a green street bench. Change the parking meter into a planted tree."

Three sentences that do a great job summing up modern big tech. The new model even manages to [digitally] remove all trash.

▲ humamf 5 hours ago | parent | prev | next [-]

The pisa tower test is really interesting. Many of this prompt have stricter criteria with implicit knowledge and some models impressively pass it. Yet for something as obvious as straightening a slanted object is hard even for latest models.

▲

kridsdale3 4 hours ago | parent [-]

I suspect there'd be no problem rotating a different object. But this tower is EXTREMELY represented in the training data. It's almost an immutable law of physics that Towers in Pisa are Leaning.

	▲	gridspy 4 hours ago \| parent [-]
		It's also a tower that has famously been deliberately un-straightend just enough to remain a tourist attraction while remaining stable.

▲ Nifty3929 4 hours ago | parent | prev | next [-]

Would you leave one of the originals in each test visible at all times (a control) so that I can see the final image(s) that I'm considering and the original image at the same time?

I guess if you do that then maybe you don't need the cool sliders anymore?

Anyway - thanks so much for all your hard work on this. A very interesting study!

▲ Wyverald 6 hours ago | parent | prev | next [-]

thanks, I love your website. Are you planning to do NB Pro for the text-to-image benchmark too?

▲ vunderba 2 hours ago | parent | next [-]

Outside the time frame of being able to edit my original reply, but I've finally re-run the Text-to-Image portion of the site through NB Pro.

  Results

  gpt-image-1: 10 / 12 
  Nano Banana Pro: 9 / 12
  Nano Banana: 8 / 12

It's worth mentioning that even though it only scored slightly better than the original NB, many of the images are significantly better looking.

https://genai-showdown.specr.net?models=nb,nbp

	▲	Wyverald 2 hours ago \| parent [-]
		thanks for the update. One small note: for the d20 test, NB Pro had duplications of 13 and 17 too, not just 19.

▲ vunderba 6 hours ago | parent | prev [-]

Definitely! Even though NB's predominant use case seems to be editing, it's still producing surprisingly decent text-to-image results. Imagen4 currently still comes out ahead in terms of image fidelity, but I think NB Pro will close the gap even further.

I'll try to have the generative comparisons for NB Pro up later this afternoon once I catch my breath.

▲ dyauspitr 2 hours ago | parent | prev [-]

Seedream generally looks like low quality outputs and it doesn’t seem like you’re assigning points for quality. This is only marginally helpful.