Remix.run Logo
sorenjan 4 hours ago

The cropdetect example made me wonder if they're thinking about including support for yolo or similar models. They're including Whisper for text to speech already, I think yolo would enable things like automatic face censoring and general frame content aware editing. Or maybe Segment anything, and have more fine grained masks available.

On the other hand, when I compared the binaries (ffmpeg, ffprobe, ffplay) I downloaded the other day with the ones I had installed since around September, they where almost 100 MB larger. I don't remember the exact size of the old ones but the new ones are 640 MB, the old ones well under 600 MB. The only difference in included libraries was Cairo and the JPEG-XS lib. So while I think a bunch of new ML models would be really cool, maybe they don't want to go down that route. But some kind of pluggable system with accelerated ML models would be helpful I think.