| ▲ | simonw 2 days ago | ||||||||||||||||||||||||||||||||||
Wow, there's a lot going on with this pelican riding a bicycle: https://gist.github.com/simonw/c31d7afc95fe6b40506a9562b5e83... | |||||||||||||||||||||||||||||||||||
| ▲ | alechewitt 2 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
Nice work on these benchmarks Simon. I’ve followed your blog closely since your great talk at the AI Engineers World Fair, and I want to say thank you for all the high quality content you share for free. It’s become my primary source for keeping up to date. I’ve been working on a few benchmarks to test how well LLMs can recreate interfaces from screenshots. (https://github.com/alechewitt/llm-ui-challenge). From my basic tests, it seems GPT-5.2 is slightly better at these UI recreations. For example, in the MS Word replica, it implemented the undo/redo buttons as well as the bold/italic formatting that GPT-5.1 handled, and it generally seemed a bit closer to the original screenshot (https://alechewitt.github.io/llm-ui-challenge/outputs/micros...). In the VS Code test, it also added the tabs that weren’t visible in the screenshot! (https://alechewitt.github.io/llm-ui-challenge/outputs/vs_cod...). | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | Stevvo 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | BeetleB 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
They probably saw your complaint that 5.1 was too spartan and a regression (I had the same experience with 5.1 in the POV-Ray version - have yet to try 5.2 out...). | |||||||||||||||||||||||||||||||||||
| ▲ | tkgally 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
I added GPT-5.2 Pro to my pelican-alternatives benchmark for the first three prompts: Generate an SVG of an octopus operating a pipe organ Generate an SVG of a giraffe assembling a grandfather clock Generate an SVG of a starfish driving a bulldozer https://gally.net/temp/20251107pelican-alternatives/index.ht... GPT-5.2 Pro cost about 80 cents per prompt through OpenRouter, so I stopped there. I don’t feel like spending that much on all thirty prompts. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | AstroBen 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Seems to be getting more aerodynamic. A clear sign of AI intelligence | |||||||||||||||||||||||||||||||||||
| ▲ | fxwin 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
the only benchmark i trust | |||||||||||||||||||||||||||||||||||
| ▲ | minimaxir 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
Is that the first SVG pelican with drop shadows? | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | sroussey 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
What is good at SVG design? | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | tmaly 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
seems to be eating something | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | belter 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
What happens if you ask for a pterodactyl on a motorbike? Would like to know how much they are optimizing for your pelican.... | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | nightshift1 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
benchmarks probably should not be used for so long. | |||||||||||||||||||||||||||||||||||
| ▲ | tootie 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Do you think the big guys are on to your game and have been adding extra pelicans to the training data? | |||||||||||||||||||||||||||||||||||