| ▲ | simonw 2 hours ago |
| The pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment... Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars. Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl... |
|
| ▲ | hedgehog 2 hours ago | parent | next [-] |
| That pelican looks like it's in Miami for a crypto conference. |
| |
|
| ▲ | tantalor 2 hours ago | parent | prev | next [-] |
| Forgetting the chainstay is typical of asking random people to draw a bicycle. https://www.gianlucagimini.it/portfolio-item/velocipedia/ > most ended up drawing something that was pretty far off from a regular men’s bicycle |
| |
| ▲ | et1337 an hour ago | parent [-] | | Asking random people to write SVG gives even worse results | | |
| ▲ | lxgr 40 minutes ago | parent [-] | | Especially without being able to look at the rendered output! (At least I'd be surprised if modern server-side tool calls regularly include an SVG renderer that can show a rasterized version to the model to iterate on it.) |
|
|
|
| ▲ | irthomasthomas 2 hours ago | parent | prev | next [-] |
| This is a perfect illustration of something I noticed with llm progress. Ask them to improve an svg like this, and it never fixes the missing crossbar or disconnected limbs, it just adds more stuff. In this example they have obviously improved greatly, and it contains a ridiculous amount of detail, but they still to get the basic shape of the frame wrong. It's weird. And the pattern shows up everywhere, try it with a webpage and it will add more buttons and stuff. I've even experimented with feeding the broken pelican svgs to an image model to look for flaws, and they still fail to spot the broken elements. edit: fixed human hallucination |
| |
| ▲ | derefr an hour ago | parent | next [-] | | When you say "improve an svg like this", how are you imagining setting that workflow up? Are you just feeding them the SVG to iterate on; or are you giving them access to a browser to look at the rendering of the SVG? I ask because: Insofar as the original pelican test is zero-shot, it effectively serves as a way to test for the presence of a kind of "visual imagination" component within the layers of the model, that the model would internally "paint" an SVG [or PostScript, etc] encoding of an image onto, to then extract effective features from, analyze for fitness as a solution to a stated request, etc. But if you're trying to do a multi-shot pelican, then just feeding back in the SVG produced in the previous attempt, really doesn't correspond to any interesting human capability. Humans can't take an SVG of a pelican and iteratively improve upon it just based on our imagined version of how that SVG renders, either! Rather, a human, given the pelican, would simply load the pelican SVG in a browser; look at the browser's rendering of the pelican; note the things wrong with that rendering; and then edit the SVG to hopefully fix those flaws (and repeat.) I imagine current (mult-modal and/or computer-use) LLMs would actually be very good at such an "iterative rendered pelican" test. | | |
| ▲ | irthomasthomas an hour ago | parent [-] | | I'm talking about two type of improvement, model improving, and prompt based improving. I am noticing that the baseline output has a lot more going on, the model has improved, yet it still makes those obvious looking mistakes with the shape of the frame or disconnected limbs etc. And I am saying that if you take one of these SVGs and ask an LLM to look for flaws, it rarely spots those obvious flaws and instead suggests adding a sunset and fish in the birds mouth. |
| |
| ▲ | 40 minutes ago | parent | prev [-] | | [deleted] |
|
|
| ▲ | smcleod 2 hours ago | parent | prev | next [-] |
| I feel like it embodies Google's vibe of an uncool guy trying to stay relevant to the youth pretty well. |
|
| ▲ | nickvec 27 minutes ago | parent | prev | next [-] |
| I enjoy the vaporwave aesthetic it went for. Looks like the pelican has a fish in its mouth too? https://en.wikipedia.org/wiki/Vaporwave |
|
| ▲ | khy an hour ago | parent | prev | next [-] |
| That sun is very similar to the one from the background of this other top HN post about the OS museum: https://news.ycombinator.com/item?id=48195009 |
|
| ▲ | hydra-f 2 hours ago | parent | prev | next [-] |
| Same old issue with Gemini models trying to "enrich" everything |
|
| ▲ | holtkam2 2 hours ago | parent | prev | next [-] |
| at a certain point you're gonna need to change your benchmark because this will end up in the model's training set |
| |
|
| ▲ | gcgbarbosa 2 hours ago | parent | prev | next [-] |
| funny that when I try the same prompt, gemini generates an image, not an SVG.
something is not right. |
| |
| ▲ | simonw 2 hours ago | parent [-] | | That's likely because you're using the Gemini app which has a tool for image generation (nano banana) - I do my tests against the API to avoid any possibility of tool use. | | |
| ▲ | nickmccann 2 hours ago | parent [-] | | This question makes me wonder if you one shot each pelican or do you run it a few times to get the best one? |
|
|
|
| ▲ | nashashmi 2 hours ago | parent | prev | next [-] |
| Beats a human by like 10$ |
| |
| ▲ | unglaublich 2 hours ago | parent [-] | | So according to Google logic, the value of the pelican is $10-eps.
(They applied that reasoning to conversions via adwords) |
|
|
| ▲ | setgree an hour ago | parent | prev | next [-] |
| `<!-- Pelican Eye / Sunglasses (Cool Retro Aviators) -->` wtf `<!-- Gold Rim -->` WTF?? |
|
| ▲ | 2 hours ago | parent | prev [-] |
| [deleted] |