Remix.run Logo
Aeolun a day ago

I think claude could have easily used a script to calculate the positions of the planets exactly here, instead of trying to use the frankly horrible image recognition.

fluidcruft a day ago | parent [-]

Claude could probably have written a script to do it. I'm hunting assets to confirm.

Edit: yup very easy for claude. Most of the wall time was spent installing cv2 in a venv (uv dropped python 3.14 on claude, which seemed to get stuck compiling numpy or something so I interrupted and told claude to just use python 3.13 instead).

ME: In this directory there are two images: @p-pressbox.gif and @screenshot.png. @p-pressbox.gif is an image that exists somewhere inside @screenshot.png Write a script that finds the precise location

CLAUDE: Looking at the images, I can see that p-pressbox.gif (the spaceship with "PRESS BOX SHUTTLE" text) appears in the upper-left area of screenshot.png. I'll write a Python script using OpenCV to find the precise location.

[ some venv stuff/telling claude to use uv and python 3.13 ]

CLAUDE: The script found the precise location of p-pressbox.gif within screenshot.png:

  | Measurement         | Value           |
  |---------------------|-----------------|
  | Top-left corner     | (87, 110)       |
  | Bottom-right corner | (218, 166)      |
  | Center              | (152, 138)      |
  | Template size       | 131 × 56 pixels |
  | Match confidence    | 97.2%           |

  The script is at /tmp/search/find_image.py and uses OpenCV's template matching with normalized cross-correlation.
ME> As part of the output create a visualization of the @p-pressbox.gif positioned on top of the original screenshot

CLAUDE: Done! The visualization shows a green rectangle highlighting the exact location where p-pressbox.gif was found within the screenshot. The match is at coordinates (87, 110) to (218, 166).

I confirmed the result was correct.

thecr0w a day ago | parent [-]

Oh what the heck. That worked really well for you. Would you be willing to recreate all the html and push it up to github? I'll drop the repo at the top of the blog post. It would be really cool for me to see this completely done and a great way to finish out the blog post. I obviously couldn't do it.

fluidcruft a day ago | parent [-]

I got pretty far with this initial prompt:

    spacejam-1996.png is a full screenshot of the Space Jam 1996
    landing page. We want to recreate this landing page as faithfully
    as possible, matching the screenshot exactly.

    The asset directory contains images extracted from the original
    site. One of the images is tiled as the background of the landing
    page. The other images should appear once in the screenshot. Use
    these images as assets.

    Precise positioning is very important for this project, so you
    should writing a script that finds the precise location of each
    asset image in screenshots. Use the tool to detect precise
    positions in the target and fine tune the generated webpage. Be
    sure to generate diagnostic images that can be easily reviewed by
    a human reviewer.

    Use python 3.13 and uv to create a venv while working.
I just let Claude (Opus 4.5) do anything it wanted to do as it went.

At this point all the image assets are pixel perfect but the footer is in the wrong place and I had to hold Claude's hand a bit to get the footer into the approximately correct spot:

    I noticed you were struggling to find the position of the footer
    text. You could try rendering two versions of the generated page, the
    second time with the footer text black. Subtracting those two images
    should give you a clean view of the footer text.
At this point Claude was having trouble because its hadn't got a clean view of the target text location in the original screenshot (it was creating scripts that look at the red channel in the bottom half of the image to pull out the text but that was also grabbing part of the site map logo. Interestingly it made a comment about this but didn't do anything about it). So I gave it this additional hint:

    You are getting confused with the site map when analyzing the
    original screenshot. You could blank out the positions of assets
    so that they are not interfering with your analysis.
This got the footer in the correct location but the fonts/font sizes etc are not correct yet.
thecr0w a day ago | parent [-]

> Interestingly it made a comment about this but didn't do anything about it

Classic.

This is awesome. Great work. Please follow up again if you happen to nail it.

fluidcruft 20 hours ago | parent [-]

It's now got everything close after adding this final prompt:

    We are very close. The footer is positioned in roughly the correct location
    but the fonts, font sizes, font color and line spacings are all slightly
    off.
This took quite a while and it build a few more tools to get there. And this was fine from a distance but it was using a san-serif when the screenshot has a serif etc. So I decided to push. From here it got very messy...

One of the issues is that Claude's text detection was getting tripped up by writing scripts using RGB space instead of something more hue-aware. It knew the text was red but was trying to isolate it by just looking at the red channel. But the grey dots from the background show up bright in the red channel so Claude would think those were center dots between the links that needed to be reproduced in the text. I gave it a hint:

    I think dots from the background image are causing issues. Are you detecting the text
    by looking only at the red channel in RGB space? The red channel will be bright on 
    white pixels in RGB. You could try using hue to separate text from background or use
    distance from the target RGB value.
Claude decided to switch to HSV space. But it took quite a bit of effort to keep Claude remembering to use HSV because tools it had already written were still RGB and not updated (as were intermediate images that were not updated). Then it would try to step back and get a big picture as a sanity check and "discover" it had missed the dots that are obviously there. And when you would tell it there are no dots, you get the "You're absolutely right! They're vertical bars!" So it was a struggle. This is the closest I got:

https://imgur.com/a/79Iv1jO

Again, the top image stuff was done in the first shot with the prompt in the first one. Everything else has been about the footer. Claude has been writing a lot of clever scripts to measure font metrics and pick fonts etc, but it keeps falling over those dots. I could probably get it to work better with adding directives for text handling to CLAUDE.md and nuking context and some of the scripts it created.

thecr0w 5 hours ago | parent [-]

ha this is awesome. I'm going to link this comment in the post. Great work. lmk if you end up pushing it up to github.