Remix.run Logo
vunderba 4 hours ago

In fact, one of the tests I use as part of GenAI Showdown involves both parts of the puzzle: draw a maze with a clearly defined entrance and exit, along with a dashed line indicating the solution to the maze.

Only one model (gpt-image-1) out of the 18 tested managed to pass the test successfully. Gemini 3.0 Pro got VERY close.

https://genai-showdown.specr.net/#the-labyrinth

danielvaughn 4 hours ago | parent [-]

super cool! Interesting note about Seedream 4 - do you think awareness of A* actually could improve the outcome? Like I said, I'm no AI expert, so my intuitions are pretty bad, but I'd suspect that image analysis + algorithmic pathfinding don't have much crossover in terms of training capabilities. But I could be wrong!

vunderba 4 hours ago | parent [-]

Great question. I do wish we had a bit more insight into the exact background "thinking" that was happening on systems like Seedream.

When you think about posing the "solve a visual image of a maze" to something like ChatGPT, there's a good chance it'll try to throw a python VM at it, threshold it with something like OpenCV, and use a shortest-path style algorithm to try and solve it.