Remix.run Logo
cortexosmain 8 hours ago

Hi HN! I built this because I was frustrated that no LLM actually "sees" a video — Claude won't accept video files, ChatGPT reads the transcript only, and Gemini samples at a fixed 1fps (missing fast cuts, over-sampling static slides).

claude-real-video takes a URL or local file and:

1. Extracts frames at every scene change (not fixed intervals) + a density floor 2. Deduplicates with a sliding-window pixel-diff algorithm (so A-B-A interview cutaways don't re-send the same shot) 3. Transcribes audio (prefers embedded subtitles, falls back to Whisper) 4. Optionally keeps the full soundtrack for audio-capable models 5. Writes a clean MANIFEST.txt you can drop into any LLM chat

A 10-min presentation goes from ~600 fixed-interval frames to 5-15 meaningful keyframes. 90%+ token savings with better comprehension.

The dedup approach (v0.2.0) uses real pixel difference on 16x16 RGB thumbnails against a sliding window of the last N kept frames — inspired by videostil's pixelmatch, but simpler and self-contained.

`--report` generates a self-contained HTML showing every keep/drop decision with diff percentages, so you can tune the threshold visually.

pip install claude-real-video && crv "https://youtube.com/watch?v=..." --report

MIT licensed, pure Python + ffmpeg. Happy to answer questions!

garciasn 5 hours ago | parent | next [-]

I gave Claude a video provided by a county attorney for a speeding ticket I got. It was spot on in its analysis, even though I don’t like what the video showed.

What does it mean that Claude can’t view video; it did it just fine. Or do you mean tool less?

torhorway 5 hours ago | parent [-]

yeah im pretty sure claude code can handle videos. its been doing frame by frame analysis for me with generated video to iterate on pipelines

AmazingEveryDay 5 hours ago | parent | prev | next [-]

I think a more or less clunky name like 'llm video preprocessor' would be better description? In any case seems like a you came up with a good project idea. I wonder how long until the sota models will just have this kind of functionallity built in.

ProofHouse 6 hours ago | parent | prev [-]

Very cool I have something that does this as well along these lines. I’ll dig into yours over the next few days and contribute where and if I can too, awesome to see!