Remix.run Logo
SoftTalker 5 hours ago

LLMs are trained on text. Why would we expect them to understand a visual and tactile 3D world?

azinman2 5 hours ago | parent [-]

Because they’re also multimodal vLLMs.