Remix.run Logo
HPsquared 3 months ago

I suppose the gold standard would be a multimodal model that also looks at the screen (maybe only if the captions aren't making much sense).