Remix.run Logo
minimaxir 4 hours ago

So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:

    Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.

    You MUST obey ALL the FOLLOWING rules for these subimages:
    - Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
      - NEVER include a `#` in the label
      - This text is left-justified, white color, and Menlo font typeface
      - The label fill color is black
    - If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
    - If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
    - If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style
The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

It did more inventive styles for the images that appear to be original, but:

- The style logic is by row, not raw numbers and are therefore wrong

- Several of the Pokemon are flat-out wrong

- Number font is wrong

- Bottom isn't square for some reason

Odd results.

dvt 2 hours ago | parent | next [-]

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

the_arun 2 minutes ago | parent | next [-]

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

anshumankmr 15 minutes ago | parent | prev | next [-]

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

hyperadvanced 11 minutes ago | parent | prev [-]

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI

rrr_oh_man 2 hours ago | parent | prev [-]

Why would you consider this a good prompt?

minimaxir 2 hours ago | parent | next [-]

Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.

I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)

o10449366 an hour ago | parent | prev [-]

Because it's quirky and obscure.

HNers love to write prompts that "prove" they're still smarter than AI.

These are likely the same people that ask ridiculous coding questions in interviews to dunk on candidates even if the questions are totally unrelated to the job.

minimaxir an hour ago | parent | next [-]

"Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.

We have enough people complaining about Simon Willison's pelican test.

Bjartr 12 minutes ago | parent | prev | next [-]

What would make the prompt a better actual evaluation in your judgement?

tailscaler2026 25 minutes ago | parent | prev | next [-]

still #opentowork huh

codemog an hour ago | parent | prev [-]

Ah yes, also known as C++ enjoyers.