| ▲ | 10xDev 5 hours ago | |||||||
Whether it is text or an image, it is just bits for a computer. A token can represent anything. | ||||||||
| ▲ | A_D_E_P_T 5 hours ago | parent [-] | |||||||
Sure, but don't conflate the representation format with the structure of what's being represented. Everything is bits to a computer, but text training data captures the flattened, after-the-fact residue of baseline human thought: Someone's written description of how something works. (At best!) A world model would need to capture the underlying causal, spatial, and temporal structure of reality itself -- the thing itself, that which generates those descriptions. You can tokenize an image just as easily as a sentence, sure, but a pile of images and text won't give you a relation between the system and the world. A world model, in theory, can. I mean, we ought to be sufficient proof of this, in a sense... | ||||||||
| ||||||||