| ▲ | samrus 3 hours ago | |
Whats the first L stand for? Thats not just vestogial, their model of the world is formed almost exclusively from language rather than a range of things contributing significantly like for humans. The biggest thing thats missing is actual feedback to their decisions. They have no "idea of that because transformers and embeddings dont model that yet. And langiage descriptions and image representations of feedback arent enough. They are too disjointed. It needs more | ||