Remix clone Hacker News

new | show | ask | jobs Github

	▲	sailingparrot 4 hours ago
		> you don't need to make a video model. You probably don't need to decode the latents at all. If you don't decode, how do you judge quality in a world where generative metrics are famously very hard and imprecise? How do you go about integrating RLHF/RLAF in your pipeline if you don't decode, which is not something you can skip anymore to get SotA? Just look at the companies that are explicitly aiming for robotics/simulation, they are doing video models.