| ▲ | danielvaughn 4 hours ago | ||||||||||||||||
I don’t know much about AI, but I have this image test that everything has failed at. You basically just present an image of a maze and ask the LLM to draw a line through the most optimal path. Here’s how Nano Banana fared: https://x.com/danielvaughn/status/1971640520176029704?s=46 | |||||||||||||||||
| ▲ | JamesSwift 3 hours ago | parent | next [-] | ||||||||||||||||
I just oneshot it with claude code (opus 4.5) using this prompt. It took about 5 mins and included detecting that it was cheating at first (drew a line around the boundary of the maze instead), so it added guardrails for that: ``` Create a devenv project that does the following:
Use whatever lib/framework is most appropriate``` | |||||||||||||||||
| |||||||||||||||||
| ▲ | FeepingCreature 17 minutes ago | parent | prev | next [-] | ||||||||||||||||
I kinda want to know what happens if you make it continue the line by one step 20 times in a row. A human can draw this gradually, the image model has to draw it in one shot all at once. | |||||||||||||||||
| ▲ | vunderba 4 hours ago | parent | prev | next [-] | ||||||||||||||||
In fact, one of the tests I use as part of GenAI Showdown involves both parts of the puzzle: draw a maze with a clearly defined entrance and exit, along with a dashed line indicating the solution to the maze. Only one model (gpt-image-1) out of the 18 tested managed to pass the test successfully. Gemini 3.0 Pro got VERY close. | |||||||||||||||||
| |||||||||||||||||
| ▲ | kridsdale3 4 hours ago | parent | prev | next [-] | ||||||||||||||||
I have also tried the maze from a photo test a few times and never seen a one-shot success. But yesterday I was determined to succeed so I allowed Gemini 3 to write a python gui app that takes in photos of physical mazes (I have a bunch of 3d printed ones) and find the path. This does work. Gemini 3 then one-shot ported the whole thing (which uses CV py libraries) to a single page html+js version which works just as well. I gave that to Claude to assess and assign a FAANG hiring level to, and it was amazed and said Gemini 3 codes like an L6. Since I work for Google and used my phone in the office to do this, I think I can't share the source or file. | |||||||||||||||||
| ▲ | pwagland 4 hours ago | parent | prev | next [-] | ||||||||||||||||
I tried this with Claude: ``` > [Image #1] Create a unicode "ascii-art" version of this image, with the optimal path through the maze highlighted in a solid colour. I'll create an ASCII art version of this maze with the solution path highlighted!
```Suffice to say, it didn't do either part right. | |||||||||||||||||
| |||||||||||||||||
| ▲ | buildbot 4 hours ago | parent | prev | next [-] | ||||||||||||||||
That might be an interesting training set, a bunch of mazes… | |||||||||||||||||
| ▲ | jiggawatts 4 hours ago | parent | prev [-] | ||||||||||||||||
The reason is that image generators don't iterate on the output in the same way the text-based LLMs do. Essentially they produce the image in "one hit" and can't solve a complex sequence in the same way you couldn't one-shot this either. Try taking a random maze, glance at it, then go off to draw a squiggle on a transparency. If you were to place that on top of the maze, there's virtually no chance that you'd have found the solution on the first try. That's essentially what's going on with AI models, they're struggling because they only get "one step" to solve the problem instead of being able to trace through the maze slowly. An interesting experiment would be to ask the AI to incrementally solve the maze. Ask it to draw a line starting at the entrance a little ways into the maze, then a little bit further, etc... until it gets to the end. | |||||||||||||||||