| ▲ | simonw 7 hours ago |
| I've been trying out the new model like this: OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"
Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... |
|
| ▲ | simonw 7 hours ago | parent | next [-] |
| I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples... OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!I think that image cost 40 cents. |
| |
| ▲ | makira 7 hours ago | parent | next [-] | | Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave: "Found the raccoon holding a ham radio in waldo2.png (3840×2160). - Raccoon center: roughly (460, 1680)
- Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)
- Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780
It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
Which is correct! | | |
| ▲ | cwillu 6 hours ago | parent | next [-] | | I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon. | | |
| ▲ | makira 6 hours ago | parent [-] | | simonw posted 2 different images: make sure to look at the second one. | | |
| ▲ | cwillu 6 hours ago | parent [-] | | Yeah, I noticed that just now, but too late to delete the comment :p | | |
| ▲ | jaggederest 4 hours ago | parent [-] | | You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments. |
|
|
| |
| ▲ | M3L0NM4N 4 hours ago | parent | prev [-] | | We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find. | | |
| ▲ | nerdsniper 2 hours ago | parent [-] | | There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon. |
|
| |
| ▲ | wewtyflakes 4 hours ago | parent | prev | next [-] | | A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd! | | | |
| ▲ | davebren 7 hours ago | parent | prev | next [-] | | The faces...that's nice that it turned a kid's book into an abomination | | |
| ▲ | keithnz 2 minutes ago | parent | next [-] | | it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my. | |
| ▲ | Filligree 2 hours ago | parent | prev [-] | | By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits. | | |
| ▲ | davebren 2 hours ago | parent [-] | | It could already copy the art styles from its training data, what is the advancement here? |
|
| |
| ▲ | mirekrusin 2 hours ago | parent | prev | next [-] | | Can it generate non halloween version though? This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability. | |
| ▲ | 2 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | louiereederson 7 hours ago | parent | prev | next [-] | | The people in this image remind me of early this person does not exist, in the best way | | | |
| ▲ | gpt5 4 hours ago | parent | prev | next [-] | | I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me. https://postimg.cc/wyxgCgNY | |
| ▲ | ireadmevs 7 hours ago | parent | prev [-] | | I found it on the 2nd image! On the 1st one not yet... |
|
|
| ▲ | makira 7 hours ago | parent | prev | next [-] |
| > though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure I see an opportunity for a new AI test! |
| |
| ▲ | vunderba 6 hours ago | parent | next [-] | | There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer. It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure. | |
| ▲ | simonw 7 hours ago | parent | prev [-] | | I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it. | | |
|
|
| ▲ | vova_hn2 an hour ago | parent | prev | next [-] |
| Thanks for the image, I will see their faces in my nightmares. |
| |
| ▲ | vunderba an hour ago | parent [-] | | This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely. |
|
|
| ▲ | nerdsniper 2 hours ago | parent | prev | next [-] |
| That is a devilishly difficult prompt for current diffusion tasks. Kudos. |
|
| ▲ | pants2 7 hours ago | parent | prev | next [-] |
| The second 4K image definitely has a raccoon on the left there! Nice. |
|
| ▲ | marricks 3 hours ago | parent | prev | next [-] |
| Like... this has things that AI will seemingly always be terrible at? At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have: - Nightmarish screaming faces on most people - A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist - A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity... It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals... We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos?? |
| |
| ▲ | p1esk 2 hours ago | parent [-] | | AI will seemingly always be ... You do realize that the whole image generation field is barely 10 years old? I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic! |
|
|
| ▲ | ritzaco 7 hours ago | parent | prev | next [-] |
| haha took me a while to notice that one of the buildings is labelled 'Ham radio' |
|
| ▲ | ElFitz 7 hours ago | parent | prev | next [-] |
| Damn. There’s a fun game app to make here ^^ |
| |
| ▲ | dymk 3 hours ago | parent [-] | | Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors. |
|
|
| ▲ | arealaccount 7 hours ago | parent | prev | next [-] |
| I see the raccoon |
|
| ▲ | 7 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | 6 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | tptacek 7 hours ago | parent | prev [-] |
| 5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top." (I don't think it's right). |
| |