▲ | ludwigschubert 9 hours ago | |
The user you originally replied to specifically mentioned > without going to text first | ||
▲ | adastra22 9 hours ago | parent [-] | |
Yeah, and that's my understanding. Nothing goes video -> text, or audio -> text, or even text -> text without first going through state space. That's where the core of the transformer architecture is. |