Remix clone Hacker News

new | show | ask | jobs Github

	▲	almaight 3 hours ago
		"multi-modal feature extraction → semantic translation → cross-modal feature transfer → precise temporal alignment," is all we need