Remix.run Logo
croes 10 hours ago

Task:

Create an imahe of an analog clock that show 09:30 a.m

Last time I checked ChatGPT failed miserably, took my 10 year old nephew a minute.

Maybe it's bad to extrapolate those trends beacuse there is no constant growth. How looked the same graph when self driving took off and how is it now?

keybits 8 hours ago | parent | next [-]

Claude created an SVG as an artifact for me - it's pretty good: https://claude.site/artifacts/b6d146cc-bed8-4c76-a8cd-5f8c34...

The hour hand is pointing directly at 9 when it should be between 9 and 10.

It got it wrong the first time (the minute hand was pointing at 5). I told it that and it apologised and fixed it.

croes 8 hours ago | parent [-]

Because your image is code.

Try something that outputs pixel.

Then you see the curse of limited training data

kqr 9 hours ago | parent | prev | next [-]

Interesting. I tried this with 3.5 Sonnet and it got it on the first attempt, using CSS transformations to specify the angle of the hour hand.

It failed, even with chain-of-thought prompting, when I asked for an SVG image, because it didn't realise it needed to be careful when converting the angle of the hour hand to Cartesian coordinates. When prompted to pay extra attention to that, it succeeds again.

I would assume models with chain-of-thought prompting baked in would perform better on the first attempt even at an SVG.

croes 8 hours ago | parent [-]

Because your image is code.

Try something that outputs pixel.

kqr 8 hours ago | parent [-]

What does it even mean to "output pixel"? The SVG format can be displayed with pixels, as can TARGA, JPEG, PNG, PostScript, and many others. Which format do you expect it to do, and why is that specific format the benchmark? Did your nephew produce the correct JPEG bytes in a minute?

Surely you don't expect a language model to move a child's hands and arms to produce the same pencil strokes. It would be the "do submarines swim like fishes" mistake again.

croes 7 hours ago | parent [-]

There is a difference when the image is created as jpg, png etc.

Because those are based on image training data. There is a bias in that images for showing 10:10 because it’s deemed the most aesthetic look.

wickedsight 9 hours ago | parent | prev [-]

I like this one! Just tried in in o3, it generated 10:10 3 times. Then it got frustrated and wrote a python program to do it correctly. Then I passed that image into o4 and got a realistic looking one... That still showed 10:10.

Search for 'clock' on Google Images though and you'll instantly see why it only seems to know 10:10. I'll keep trying this one in the future, since it really shows the influence of training data on current models.