Remix.run Logo
cortesoft 11 days ago

LLMs also have other inputs, like audio and images. They get encoded (just like a human eye encodes an image) and passed to the weights.

vermilingua 11 days ago | parent [-]

I don’t think this analogy holds. The whole way through the processing pipeline in the brain, different sensory data is ingested separately and processed separately; and we still don’t understand how that data is then integrated into a cohesive experience.

LLMs have the same fundamental input regardless of modality, tokens. There is a preprocessing step before the “brain”, which is more akin to some super-synesthesia where all senses are translated into sound before becoming experience.

cortesoft 10 days ago | parent [-]

Can't you say the same about the connectivity between the brain and your senses? Your eyes do 'preprocessing', but in the end the connection to your brain is just through electrical impulses in the end. All senses get translated to some sort of electrical signal, just like in an LLM with tokens.