Remix clone Hacker News

new | show | ask | jobs Github

	▲	derac 2 hours ago
		Look at the table of supported modalities. It can take in input of image/video/text/actions and output image/video/text/actions.
	▲	causal an hour ago \| parent [-]
		That just raises more questions. What kind "observation or action" image does input generate? What is an action output if it's not text?