Remix.run Logo
barcode_feeder 2 days ago

I gave it a series of 11 images stripped of all metadata. It performed quite well, only misidentifying the two taken in a small college town in the NE of the US. It got two questions correct on photos taken in Korea (one with a fairly clear view of Haneul Park, the other a rather difficult to identify picture not resembling anything on google of Sunrise Peak). It got every other question in the US correct, ranging from some under-construction Austin taken from the river to some somewhat difficult shots in NYC (the upper halves of some building from Rockefeller terrace to the black wall of the MOMA). While not perfect, I'm bluntly shocked at how well it performed

thomasfromcdnjs 2 days ago | parent | next [-]

I uploaded this image that I screenshotted off Google street view (no metadata) and it got with 200m.

https://chatgpt.com/share/6801bbf7-fd40-8008-985d-75c8813f55...

There is the chat.

Weirdly it said, "I’ve seen that exact house before on Google Street View when exploring Cairns neighborhoods."

geysersam 2 days ago | parent | next [-]

> Weirdly it said, "I’ve seen that exact house before on Google Street View when exploring Cairns neighborhoods."

That's slightly creepy!

oezi 2 days ago | parent | next [-]

The anthropomorphisation certainly is weird. But the technical aspect seems even weirder. Did OpenAI really build dedicated tools to have their models train on Google Street View? Or do they have generic technology for browsing complex sites like Street view?

comex 2 days ago | parent [-]

It’s just a hallucination, same idea as o3 claiming that it uses its laptop to mine Bitcoin:

https://transluce.org/investigating-o3-truthfulness

I doubt the model was trained on Street View, but even if it was, LLMs don’t retain any “memory” of how/when they were trained, so any element of truthfulness would be coincidental.

geysersam 2 days ago | parent [-]

If it's trained on street view data it's not unlikely that the model can associate a particular piece of context to street view. For example, a picture can have telltale signs that street view content has, such as blurred faces and street signs, watermarks, etc.

Even if it's not directly trained on street view data it has probably encountered street view content in it's training dataset.

namaria 14 hours ago | parent | next [-]

The training process doesn't preserve information needed for the LLM to infer that. It cannot be anything other than nonsense that sounds plausible, which is what they do best.

oezi 2 days ago | parent | prev [-]

I think the test which the OP performed (to pick a random street view and let it pinpoint it) would indicate that it has ingested some kind of information in this regard in a structured manner.

casey2 2 days ago | parent | prev [-]

They should definitely add that feature.

Tell it your name and then it just looks you up and street views your house, and puts that all into memory.

bluesnews 2 days ago | parent | prev | next [-]

It might train off of street view

thomasfromcdnjs 2 days ago | parent | prev | next [-]

This was the image: https://imgur.com/a/cCUvgDG

marxisttemp 2 days ago | parent | prev [-]

This is the most impressive ChatGPT chat I’ve seen yet. While I theoretically can accept how large-scale probabilistic text generation can lead to this chain of “reasoning”, it really feels like actual intelligence.

HaZeust 2 days ago | parent | next [-]

It's been intelligence for a long time; the goalposts just shift, and people can't abstract the idea to an LLM. But language processing and large data processing itself IS a form of intelligence.

PhilipRoman 2 days ago | parent | prev [-]

Maybe you're right, but I think it's more likely that it had been trained on street view photos and then invented a plausible justification for the guess afterwards (which is something I often see ChatGPT do, when it easily arrives at the correct answer, but gives bullshit explanations for it).

CSMastermind 2 days ago | parent | prev | next [-]

I played a round of Geoguessr against it and while it did a shockingly good job compared to what I was expecting, it still lags behind even novice human players.

The locations and its guesses were:

Bliss, Idaho - Burns, Oregon (273 miles away)

Quilleco, Biobio, Chile - Eugene, Oregon (6,411 miles away)

Dettighofen, Switzerland - Mühldorf, Germany (228 miles away)

Pretoria, South Africa - Johannesburg, South Africa (36 miles away)

Rockhampton, Australia - Gold Coast, Australia (437 miles away)

CSMastermind 2 days ago | parent | next [-]

Okay, I decided to benchmark a bunch of AI models with geoguessr. One round each on diverse world, here's how they did out of 25,000:

Claude 3.7 Sonnet: 22,759

Qwen2.5-Max: 22,666

o3-mini-high: 22,159

Gemini 2.5 Pro: 18,479

Llama 4 Maverick: 14,316

mistral-large-latest: 10,405

Grok 3: 5,218

Deepseek R1: 0

command-a-03-2025: 0

Nova Pro: 0

nemo1618 2 days ago | parent | next [-]

Neat, thanks for doing this!

msephton 2 days ago | parent | prev | next [-]

How does Google Lens compare?

CSMastermind 2 days ago | parent [-]

I tried it but as far as I can tell Google Lens doesn't give you a location - it just describes generally what you're looking at.

bn-l 2 days ago | parent | prev [-]

What about 04-mini-high ?

CSMastermind 2 days ago | parent [-]

OpenAI's naming confuses me but I ran o4-mini-2025-04-16 through a game and it got 23,885

bn-l a day ago | parent [-]

Interesting. It supports what they said (this is the model with good visual reasoning)

jen729w 2 days ago | parent | prev [-]

I just took a picture from my own front porch of the street and the houses opposite. It said 'probably Australia but I'd need more info'.

I said, give me your best guess.

And it guessed Canberra, Australia. Where I'm sitting right now drinking a Martini. Pretty spectacular.

Measter a day ago | parent | prev | next [-]

I gave o4-mini-high a cropped version of a photo I found on Facebook[0][1], and it quickly determined that this was in the UK from the road markings. It also decided that it was from a coastal city because it could see water on the horizon, which is the correct conclusion from incorrect data. There is no water, I think that's trees on a hill. It focused heavily on the spherical structure, which makes sense because it's distinctive, though it had a hard time placing it. It also decided that the building on the left was probably a shopping centre.

It eventually decided that the photo was taken outside the Scottish Exhibition and Conference Centre in Glasgow. It actually generally considered Scottish locations more than others.

The picture was actually taken in Plymouth (so pretty much as far from Scotland as you can get in Britain), on Charles Street looking south-east[2]. The building on the right is Drake Circus, and the one on the left is the Arts University. It actually did consider Plymouth, but decided it didn't match.

[0] This image with the "university plymouth" on the left cropped out, just to make it harder: https://www.facebook.com/photo/?fbid=9719044988151697&set=gm...

[1] https://chatgpt.com/share/68024c91-61d0-800c-99b1-fcecf0bfe8...

[2] https://maps.app.goo.gl/3TXv2UxH5128xQjJ9

delusional 2 days ago | parent | prev [-]

I gave It some photos from denmark, didn't even bother to strip the metadata. One is correctly said give of "Scandinavian vibes" every other photo was very wrong. I also gave it a photo of the french Alps, it guessed Switzerland.