| ▲ | marginalia_nu 6 hours ago |
| I tried to make Claude Code, Sonnet 4.6, write a program that draws a fleur-de-lis. No exaggeration it floundered for an hour before it started to look right. It's really not good at tasks it has not seen before. |
|
| ▲ | ehnto 6 hours ago | parent | next [-] |
| Even with well understood languages, if there isn't much in the public domain for the framework you're using it's not really that helpful. You know you're at the edges of its knowledge when you can see the exact forum posts you are looking at showing up verbatim in it's responses. I think some industries with mostly proprietary code will be a bit disappointing to use AI within. |
|
| ▲ | jshmrsn 6 hours ago | parent | prev | next [-] |
| Considering that a fleur-de-lis involves somewhat intricate curves, I think I'd be pretty happy with myself if I could get that task done in an hour. Given a harness that allows the model to validate the result of its program visually, and given the models are capable of using this harness to self correct (which isn't yet consistently true), then you're in a situation where in that hour you are free to do some other work. A dishwasher might take 3 hours to do for what a human could do in 30 minutes, but they're still very useful because the machine's labor is cheaper than human labor. |
| |
| ▲ | marginalia_nu 6 hours ago | parent [-] | | I didn't provide any constraints on how to draw it. TBH I would have just rendered a font glyph, or failing that, grabbed an image. Drawing it with vector graphics programmatically is very hard, but a decent programmer would and should push back on that. | | |
| ▲ | zeroxfe 6 hours ago | parent [-] | | > TBH I would have just rendered a font glyph, or failing that, grabbed an image. If an LLM did that, people would be all up in arms about it cheating. :-) For all its flaws, we seem to hold LLMs up to an unreasonably high bar. | | |
| ▲ | marginalia_nu 6 hours ago | parent [-] | | That's the job description for a good programmer though. Question assumptions and requirements, and then find the simplest solution that does the job. Just about anyone can eventually come up with a hideously convoluted HeraldicImageryEngineImplFactory<FleurDeLis>. |
|
|
|
|
| ▲ | comex 6 hours ago | parent | prev | next [-] |
| LLMs are really bad at anything visual, as demonstrated by pelicans riding bicycles, or Claude Plays Pokémon. Opus would probably do better though. |
| |
| ▲ | tartoran 6 hours ago | parent [-] | | How could they be any good at visuals? They are trained on text after all. | | |
| ▲ | comex 6 hours ago | parent | next [-] | | Supposedly the frontier LLMs are multimodal and trained on images as well, though I don't know how much that helps for tasks that don't use the native image input/output support. Whatever the cause, LLMs have gotten significantly better over time at generating SVGs of pelicans riding bicycles: https://simonwillison.net/tags/pelican-riding-a-bicycle/ But they're still not very good. | | |
| ▲ | tartoran 6 hours ago | parent [-] | | I have to admit I'm seeing this for the first time and am somewhat impressed by the results and even think they will get better with more training, why not... But are these multimodal LLMs still LLMs though? I mean, they're still LLMs but with a sidecar that does other things and the training of the image takes place outside the LLMs so in a way the LLMs still don't "know" anything about these images, they're just generating them on the fly upon request. | | |
| |
| ▲ | astrange 6 hours ago | parent | prev | next [-] | | Claude is multimodal and can see images, though it's not good at thinking in them. | |
| ▲ | msephton 6 hours ago | parent | prev | next [-] | | Shapes can be described as text or mathematical formulas. | |
| ▲ | tempest_ 6 hours ago | parent | prev [-] | | An SVG is just text. |
|
|
|
| ▲ | internet2000 6 hours ago | parent | prev | next [-] |
| I got Opus 4.6 to one shot it, took 5-ish mins. "Write me a python program that outputs an svg of a fleur-de-lis. Use freely available images to double check your work." It basically just re-created the wikipedia article fleur-de-lis, which I'm not sure proves anything beyond "you have to know how to use LLMs" |
| |
| ▲ | 64738 4 hours ago | parent | next [-] | | Just for reference, Codex using GPT-5.4 and that exact prompt was a 4-shot that took ten minutes. The first result was a horrific caricature. After a slight rebuke ("That looks terrible. Read https://en.wikipedia.org/wiki/Fleur-de-lis for a better understanding of what it should look like."), it produced a very good result but it then took two more prompts about the right side of the image being clipped off before it got it right. | |
| ▲ | robertcope 4 hours ago | parent | prev [-] | | Same, I used Sonnet 4.6 with the prompt, "Write a simple program that displays a fleur-de-lis. Python is a good language for this." Took five or six minutes, but it wrong a nice Python TK app that did exactly what it was supposed to. |
|
|
| ▲ | scuff3d 4 hours ago | parent | prev | next [-] |
| I tried to use Codex to write a simple TCP to QUIC proxy. I intentionally kept the request fairly simple, take one TCP connection and map it to a QUIC connection. Gave a detailed spec, went through plan mode, clarified all the misunderstandings, let it write it in Python, had it research the API, had it write a detailed step by step roadmap... The result was a fucking mess. Beyond the fact that it was "correct" in the same way the author of the article talked about, there was absolutely bizarre shit in there. As an example, multiple times it tried to import modules that didn't exist. It noticed this when tests failed, and instead of figuring out the import problem it add a fucking try/except around the import and did some goofy Python shenanigans to make it "work". |
|
| ▲ | tartoran 6 hours ago | parent | prev [-] |
| Have you tried describing to Claude what it is? The more the detail the better the result. At some point it does become easier to just do it yourself. |
| |
| ▲ | parvardegr 43 minutes ago | parent | next [-] | | agreed with part that at some point it's better to just do it yourself
but for sure they will get better and better | |
| ▲ | marginalia_nu 6 hours ago | parent | prev | next [-] | | It knows what it is, it's a very well known symbol. But translating that knowledge to code is something else. Interesting shortcoming, really shows how weak the reasoning is. | | |
| ▲ | cat_plus_plus 6 hours ago | parent [-] | | Try writing code from description without looking at the picture or generated graphics. Visual LLM with a suggestion to find coordinates of different features and use lines/curves to match them might do better. |
| |
| ▲ | vdfs 6 hours ago | parent | prev [-] | | Most people just forget to tell it "make it quick" and "make no mistake" | | |
| ▲ | mekael 6 hours ago | parent | next [-] | | I’m unable to determine if you’re missing /s or not. | |
| ▲ | tartoran 6 hours ago | parent | prev [-] | | That's kind of foolish IMO. How can an open ended generic and terse request satisfy something users have in mind? |
|
|