▲ | harrall a day ago | |
I don’t have a deep understand of LLMs but don’t they fundamentally work on tokens and generate a multi-dimensional statistical relationship map between tokens? So it doesn’t have to be LLM. You could theoretically have image tokens (though I don’t know in practice, but the important part is the statistical map). And it’s not like my brain doesn’t work like that either. When I say a funny joke in response to people in a group, I can clearly observe my brain pull together related “tokens” (Mary just talked about X, X is related to Y, Y is relevant to Bob), filter them, sort them and then spit out a joke. And that happens in like less than a second. | ||
▲ | tacitusarc a day ago | parent [-] | |
Yes! Absolutely. And this is likely what would be necessary for anything approaching actual AGI. And not just visual input, but all kinds of sensory input. The problem is that we have no ability, not even close, to process that even near the level of a human yet, much less some super genius being. |