▲ | CamperBob2 2 days ago | |
We’ve known for well over a decade that you cannot cram real-world spatial dynamics into those models. It is a clear impedance mismatch. Then again, not much that we "knew" a decade ago is still relevant today. Of course transformer networks have proven capable of representing spatial intelligence. How could they work with 2D images, if not? |