Remix.run Logo
croes 3 months ago

Task:

Create an imahe of an analog clock that show 09:30 a.m

Last time I checked ChatGPT failed miserably, took my 10 year old nephew a minute.

Maybe it's bad to extrapolate those trends beacuse there is no constant growth. How looked the same graph when self driving took off and how is it now?

keybits 3 months ago | parent | next [-]

Claude created an SVG as an artifact for me - it's pretty good: https://claude.site/artifacts/b6d146cc-bed8-4c76-a8cd-5f8c34...

The hour hand is pointing directly at 9 when it should be between 9 and 10.

It got it wrong the first time (the minute hand was pointing at 5). I told it that and it apologised and fixed it.

croes 3 months ago | parent [-]

Because your image is code.

Try something that outputs pixel.

Then you see the curse of limited training data

kqr 3 months ago | parent | prev | next [-]

Interesting. I tried this with 3.5 Sonnet and it got it on the first attempt, using CSS transformations to specify the angle of the hour hand.

It failed, even with chain-of-thought prompting, when I asked for an SVG image, because it didn't realise it needed to be careful when converting the angle of the hour hand to Cartesian coordinates. When prompted to pay extra attention to that, it succeeds again.

I would assume models with chain-of-thought prompting baked in would perform better on the first attempt even at an SVG.

croes 3 months ago | parent [-]

Because your image is code.

Try something that outputs pixel.

kqr 3 months ago | parent [-]

What does it even mean to "output pixel"? The SVG format can be displayed with pixels, as can TARGA, JPEG, PNG, PostScript, and many others. Which format do you expect it to do, and why is that specific format the benchmark? Did your nephew produce the correct JPEG bytes in a minute?

Surely you don't expect a language model to move a child's hands and arms to produce the same pencil strokes. It would be the "do submarines swim like fishes" mistake again.

croes 3 months ago | parent [-]

There is a difference when the image is created as jpg, png etc.

Because those are based on image training data. There is a bias in that images for showing 10:10 because it’s deemed the most aesthetic look.

wickedsight 3 months ago | parent | prev | next [-]

I like this one! Just tried in in o3, it generated 10:10 3 times. Then it got frustrated and wrote a python program to do it correctly. Then I passed that image into o4 and got a realistic looking one... That still showed 10:10.

Search for 'clock' on Google Images though and you'll instantly see why it only seems to know 10:10. I'll keep trying this one in the future, since it really shows the influence of training data on current models.

motoxpro 3 months ago | parent | prev [-]

Now ask your 10 year old to have a realtime conversation with you in over 50 languages in a few seconds with a perfect accent. I don't think it's very useful to think of the thing that humans/AI is the worst at and index on that, it seems way more useful to index on what's useful. I truly have no use for an AI to generate me an analog clock at a specific time.