▲ | sebzim4500 a day ago | |||||||||||||||||||||||||
If every photo in streetview was included in the training data of a multimodal LLM it would be like 99.9999% of the training data/resource costs. It just isn't plausible that anyone has actually done that. I'm sure some people include a small sample of them, though. | ||||||||||||||||||||||||||
▲ | bluefirebrand a day ago | parent | next [-] | |||||||||||||||||||||||||
Why would every photo in streetview be required in order to have Geoguessr's dataset in the training data? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | clbrmbr 21 hours ago | parent | prev [-] | |||||||||||||||||||||||||
Yet. This is a good rebuttal when someone quips that we “are about to run out of data”. There’s oh so much more, just not in the form of books and blogs. |