| ▲ | suddenlybananas 2 days ago |
| Why do you say it's not included? Why wouldn't they include it. |
|
| ▲ | sebzim4500 a day ago | parent [-] |
| If every photo in streetview was included in the training data of a multimodal LLM it would be like 99.9999% of the training data/resource costs. It just isn't plausible that anyone has actually done that. I'm sure some people include a small sample of them, though. |
| |
| ▲ | bluefirebrand a day ago | parent | next [-] | | Why would every photo in streetview be required in order to have Geoguessr's dataset in the training data? | | |
| ▲ | bee_rider a day ago | parent [-] | | I’m pretty sure they are saying that Geoguessr's just pulls directly from Google Streetview. There isn’t a separate Geoguessr dataset, it just pulls from Google’s API (at least that’s what Wikipedia says). | | |
| ▲ | bluefirebrand a day ago | parent [-] | | I suspect that Geoguessr's dataset is a subset of Google Streetview, but maybe it really is just pulling everything directly | | |
| ▲ | bee_rider a day ago | parent [-] | | My guess would be that they pull directly from street-view, maybe with some extra filtering for interesting locations. Why bother to create a copy, if it can be avoided, right? |
|
|
| |
| ▲ | clbrmbr 21 hours ago | parent | prev [-] | | Yet. This is a good rebuttal when someone quips that we “are about to run out of data”. There’s oh so much more, just not in the form of books and blogs. |
|