| ▲ | derefr 2 hours ago | |
When you say "improve an svg like this", how are you imagining setting that workflow up? Are you just feeding them the SVG to iterate on; or are you giving them access to a browser to look at the rendering of the SVG? I ask because: Insofar as the original pelican test is zero-shot, it effectively serves as a way to test for the presence of a kind of "visual imagination" component within the layers of the model, that the model would internally "paint" an SVG [or PostScript, etc] encoding of an image onto, to then extract effective features from, analyze for fitness as a solution to a stated request, etc. But if you're trying to do a multi-shot pelican, then just feeding back in the SVG produced in the previous attempt, really doesn't correspond to any interesting human capability. Humans can't take an SVG of a pelican and iteratively improve upon it just based on our imagined version of how that SVG renders, either! Rather, a human, given the pelican, would simply load the pelican SVG in a browser; look at the browser's rendering of the pelican; note the things wrong with that rendering; and then edit the SVG to hopefully fix those flaws (and repeat.) I imagine current (mult-modal and/or computer-use) LLMs would actually be very good at such an "iterative rendered pelican" test. | ||
| ▲ | irthomasthomas 2 hours ago | parent [-] | |
I'm talking about two type of improvement, model improving, and prompt based improving. I am noticing that the baseline output has a lot more going on, the model has improved, yet it still makes those obvious looking mistakes with the shape of the frame or disconnected limbs etc. And I am saying that if you take one of these SVGs and ask an LLM to look for flaws, it rarely spots those obvious flaws and instead suggests adding a sunset and fish in the birds mouth. | ||