▲ | soulofmischief 6 days ago | |||||||||||||
Increasing the fidelity and richness of training data does not go against the bitter lesson. The model can learn 3D representation on its own from stereo captures, but there is still richer, more connected data to learn from with stereo captures vs monocular captures. This is unarguable. You're needlessly making things harder by forcing the model to also learn to estimate depth from monocular images, and robbing it of a channel for error-correction in the case of faulty real-world data. | ||||||||||||||
▲ | WithinReason 6 days ago | parent [-] | |||||||||||||
Stereo images have no explicit 3D information and are just 2D sensor data. But even if you wanted to use stereo data, you would restrict yourself to stereo datasets and wouldn't be able to use 99.9% of video data out there to train on which wasn't captured in stereo, that's the part that's against the Bitter Lesson. | ||||||||||||||
|