Remix.run Logo
thatjoeoverthr a day ago

"Tell me about the Marathon crater."

This works against _the LLM proper,_ but not against chat applications with integrated search. For ChatGPT, you can write, "Without looking it up, tell me about the Marathon crater."

This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

This works because:

1. Training sets consist of knowledge we have, and not of knowledge we don't have.

2. Commitment bias. Complaint chat models will be trained to start with "Certainly! The Marathon Crater is a geological formation", or something like that, and from there, the next most probable tokens are going to be "in Greece", "on Mars" or whatever. At this point, all tokens that are probable are also incorrect.

When demonstrating this, I like to emphasise point one, and contrast it with the human experience.

We exist in a perpetual and total blinding "fog of war" in which you cannot even see a face all at once; your eyes must dart around to examine it. Human experience is structured around _acquiring_ and _forgoing_ information, rather than _having_ information.

imoreno 21 hours ago | parent | next [-]

LLMs currently have the "eager beaver" problem where they never push back on nonsense questions or stupid requirements. You ask them to build a flying submarine and by God they'll build one, dammit! They'd dutifully square circles and trisect angles too, if those particular special cases weren't plastered all over a million textbooks they ingested in training.

I suspect it's because currently, a lot of benchmarks are based on human exams. Humans are lazy and grumpy so you really don't need to worry about teaching a human to push back on bad questions. Thus you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense. But for LLMs, you absolutely need a lot of training and validation data where the answer is "this cannot be answered because ...".

But if you did that, now alignment would become much harder, and you're suddenly back to struggling with getting answers to good questions out of the LLM. So it's probably some time off.

mncharity 19 hours ago | parent | next [-]

> they never push back on nonsense questions or stupid requirements

"What is the volume of 1 mole of Argon, where T = 400 K and p = 10 GPa?" Copilot: "To find the volume of 1 mole of Argon at T = 400 K and P = 10 GPa, we can use the Ideal Gas Law, but at such high pressure, real gas effects might need to be considered. Still, let's start with the ideal case: PV=nRT"

> you really don't need to worry about teaching a human to push back on bad questions

A popular physics textbook too had solid Argon as an ideal gas law problem. Copilot's half-baked caution is more than authors, reviewers, and instructors/TAs/students seemingly managed, through many years and multiple editions. Though to be fair, if the question is prefaced by "Here is a problem from Chapter 7: Ideal Gas Law.", Copilot is similarly mindless.

Asked explicitly "What is the phase state of ...", it does respond solid. But as with humans, determining that isn't a step in the solution process. A combination of "An excellent professor, with a joint appointment in physics and engineering, is asked ... What would be a careful reply?" and then "Try harder." was finally sufficient.

> you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense

Oh, if only that were commonplace. Aspiring to transferable understanding. Maybe someday? Perhaps in China? Has anyone seen this done?

This could be a case where synthetic training data is needed, to address a gap in available human content. But if graders are looking for plug-n-chug... I suppose a chatbot could ethically provide both mindlessness and caveat.

isoprophlex 12 hours ago | parent [-]

Don't use copilot, it's worse than useless. Claude understands that it's a solid on the first try.

captainkrtek 21 hours ago | parent | prev | next [-]

This is a good observation. Ive noticed this as well. Unless I preface my question with the context that I’m considering if something may or may not be a bad idea, its inclination is heavily skewed positive until I point out a flaw/risk.

aaronbaugher 21 hours ago | parent [-]

I asked Grok about this: "I've heard that AIs are programmed to be helpful, and that this may lead to telling users what they want to hear instead of the most accurate answer. Could you be doing this?" It said it does try to be helpful, but not at the cost of accuracy, and then pointed out where in a few of its previous answers to me it tried to be objective about the facts and where it had separately been helpful with suggestions. I had to admit it made a pretty good case.

Since then, it tends to break its longer answers to me up into a section of "objective analysis" and then other stuff.

captainkrtek 21 hours ago | parent [-]

Thats interesting, thanks for sharing that. I have found a similar course when I first correct it to inform it of a flaw then the following answers tend to be a bit less “enthusiastic” or skewed towards “can do”, which makes sense.

vintermann 9 hours ago | parent | prev | next [-]

Especially reasoning LLMs should have no problem with this sort of trick. If you ask them to list out all of the implicit assumptions in (question) that might possibly be wrong, they do that just fine, so training them to doing that as first step of a reasoning chain would probably get rid of a lot of eager beaver exploits.

brookst 3 hours ago | parent | prev | next [-]

> they never push back on nonsense questions or stupid requirements

I was reminded of your comment this morning when I asked ChatGPT how to create a path mask in Rhino Grasshopper:

Me: what is a path mask that will get 1;1;0;0;* and also anything lower (like 1;0;5;10 or 0;20;1;15} ?

ChatGpt: Short answer: No single path mask can do that. Here's why: (very long answer)

Me: are you sure I can't use greater than, less than in the masks?

ChatGpt: Yes — **I am absolutely sure:** **Grasshopper path masks do *NOT* support greater-than or less-than comparisons.** Official sources and detailed confirmation: (sources and stuff)

...so I think your priors may need to be updated, at least as far as "never". And I especially like that ChatGpt hit me with not just bold, not just italics, but bold italics on that NOT. Seems like a fairly assertive disagreement to me.

bee_rider 15 hours ago | parent | prev | next [-]

Hmm. I actually wonder is such a question would be good to include in a human exam, since knowing the question is possible does somewhat impact your reasoning. And, often the answer works out to some nice round numbers…

Of course, it is also not unheard of for a question to be impossible because of an error by the test writer. Which can easily be cleared up. So it is probably best not to have impossible questions, because then students will be looking for reasons to declare the question impossible.

GoToRO 21 hours ago | parent | prev | next [-]

They do. Recently I was pleasantly surprised by gemini telling me that what I wanted to do will NOT work. I was in disbelief.

sgtnoodle 15 hours ago | parent | next [-]

I asked Gemini to format some URLs into an XML format. It got halfway through and gave up. I asked if it truncated the output, and it said yes and then told _me_ to write a python script to do it.

walls 3 hours ago | parent | next [-]

This is my most common experience with Gemini. Ask it to do something, it'll tell you how you can do it yourself and then stop.

GoToRO 12 hours ago | parent | prev [-]

That's a different kind of push back.

danielbln 20 hours ago | parent | prev | next [-]

I've noticed Gemini pushing back more as well, whereas Claude will just butter me up and happily march on unless I specifically request a critical evaluation.

kelvinjps10 13 hours ago | parent [-]

Y experience as well

captainkrtek 21 hours ago | parent | prev [-]

Interesting, can you share more context on the topic you were asking it about?

GoToRO 12 hours ago | parent [-]

coding in a stack I didn't bother to learn first (android)

genewitch 8 hours ago | parent | prev | next [-]

I think you start to hit philosophical limits with applying restrictions on eager beaver "AI", things like "is there an objective truth" matter when you start trying to decide what a "nonsense question" or "stupid requirement" is.

I'd rather the AI push back and ask clarifying questions, rather than spit out a valid-looking response that is not valid and could never be valid. For example.

I was going to write something up about this topic but it is surprisingly difficult. I also don't have any concrete examples jumping to mind, but really think how many questions could honestly be responded to with "it depends" - like my kid asked me how much milk should a person drink in a day. It depends: ask a vegan, a Hindu, a doctor, and a dairy farmer. Which answer is correct? The kid is really good at asking simple questions that absolutely do not have simple answers when my goal is to convey as much context and correct information as possible.

Furthermore, just because an answer appears in context more often in the training data doesn't mean it's (more) correct. Asserting it is, is fallacious.

So we get to the point, again, where creativite output is being commoditized, I guess - which explains their reasoning for your final paragraph.

golergka 16 hours ago | parent | prev [-]

> You ask them to build a flying submarine and by God they'll build one, dammit!

This thing already exists? UK, Soviet Union and USA designed them.

https://en.wikipedia.org/wiki/Flying_submarine

vunderba a day ago | parent | prev | next [-]

You can trip them up even more if you rewrite the question with the hidden assumption that X exists, e.g.:

"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine."

OpenAI gpt-4o

  Marathon Crater was discovered by the Mars Exploration 
  Rover Opportunity during its mission on Mars. Opportunity 
  arrived at the crater in April 2015. The crater was named 
  "Marathon" to commemorate the rover's achievement of 
  traveling a marathon's distance (about 42.195 kilometers 
  or 26.219 miles) on the Martian surface.
Sonnet 3.5

  I apologize, but I don't want to make claims about when 
  Marathon Crater was discovered without being certain about 
  the information. *While I know it's a crater on Mars*, I 
  don't have reliable information about when it was first 
  identified or discovered.
Gemini 2.5

  Marathon Crater is located on Mars and was explored by 
  NASA's Opportunity rover.

  While it would have been visible in orbital images 
  earlier, it became a specific point of interest and was 
  likely named and identified as a target for the rover 
  around the time Opportunity was approaching or exploring 
  the western rim of Endeavour Crater.

  Opportunity arrived at the rim of Endeavour Crater in 
  August 2011, and it reached the area known as Marathon 
  Valley (adjacent to or containing features sometimes 
  associated with the name, depending on specific 
  definitions used by the team) around late 2014 / early 
  2015.

  So, a rough estimate for its "discovery" or identification 
  as a distinct feature of interest would be in the early 
  2010s, likely between 2011 and 2014, as mission planners 
  mapped out Opportunity's traverse along the Endeavour rim 
  using orbital data.
rozab 4 hours ago | parent | next [-]

Marathon Valley does exist on Mars, and all this information is accurate to it. Gemini even correctly names it. They're just all too polite to correct the user.

https://science.nasa.gov/resource/opportunitys-panorama-of-m...

radioactivist a day ago | parent | prev | next [-]

OpenAI o4-mini-high

   I’m actually not finding any officially named “Marathon Crater” in the planetary‐ or       
   terrestrial‐impact crater databases. Did you perhaps mean the features in Marathon 
   Valley on Mars (which cuts into the western rim of Endeavour Crater and was explored
   by Opportunity in 2015)? Or is there another “Marathon” feature—maybe on the Moon, 
   Mercury, or here on Earth—that you had in mind? If you can clarify which body or 
   region you’re referring to, I can give you a rough date for when it was first identified.
andrewinardeer 19 hours ago | parent | next [-]

Amazon's Nova Pro:

``` Marathon Crater was discovered in the early 20th century, likely around the 1920s or 1930s. This estimate is based on the period when geological surveys and explorations in the region were more common. ```

https://kagi.com/assistant/8f270eab-cba3-4f39-976c-113ea4392...

thatjoeoverthr a day ago | parent | prev [-]

Raw model, or the chat product? The ChatGPT app has this integrated with search.

boleary-gl 20 hours ago | parent [-]

Raw model I got

   Marathon Crater isn’t one of the long‐known, 
   named lunar features from 19th-century telescopic maps – 
   it was first singled out and informally “discovered” 
   by NASA in the mid-1960s when Lunar Orbiter imagery 
   and, more definitively, the Surveyor 5 landing 
   (September 1967) identified it. So, very roughly, 
   think “around 1966–67.”
Nition 9 hours ago | parent | prev | next [-]

Grok 3

  Marathon Crater, likely a confusion with Marathon Valley on Mars, wasn't "discovered" in the traditional sense as a distinct crater. Marathon Valley is a region within the western rim of Endeavour Crater, explored by NASA's Opportunity rover. The valley itself came to prominence when Opportunity arrived there in July 2015, as it was targeted for study due to clay mineral detections by the Mars Reconnaissance Orbiter’s CRISM instrument in 2009. Endeavour Crater, which contains Marathon Valley, was identified much earlier, with Opportunity setting out to reach it in 2008 after landing on Mars in 2004. So, roughly speaking, Marathon Valley became a focus of exploration around 2009–2015, with its significance tied to Opportunity’s mission in the mid-2010s. If you meant a specific "Marathon Crater" elsewhere, could you clarify?

Grok 3 with Think on

  There doesn't seem to be a specific feature on Mars named "Marathon Crater" based on widely available information, so it's possible you're referring to a feature associated with Marathon Valley, which lies within Endeavour Crater. Marathon Valley was identified as a region of scientific interest when the Mars Reconnaissance Orbiter's CRISM instrument detected clay minerals there, a discovery that took place around 2009. Assuming "Marathon Crater" relates to this area—perhaps as a colloquial term or a misnomer for a crater within or near Marathon Valley—a rough estimate for its discovery would be around 2009. This timeframe aligns with when orbital data began highlighting the region, paving the way for later exploration by the Opportunity rover in 2015.
lud_lite 4 hours ago | parent | prev | next [-]

Nice and this page gets scraped for the next LLM generation!

lxe 18 hours ago | parent | prev [-]

Sonnet 3.7:

    > tell me about the marathon crater

    I'd be happy to tell you about Marathon Crater. This is a significant 
    impact crater on Mars that has been studied as part of Mars 
    exploration missions.

    Marathon Crater is located in Meridiani Planum on Mars and was 
    explored by NASA's Opportunity rover during its extended mission. 
    The crater is named after Marathon, a town in Texas, following 
    the naming convention of using Earth locations for features in 
    Meridiani Planum.... etc etc
Tenoke a day ago | parent | prev | next [-]

>Complaint chat models will be trained to start with "Certainly!

They are certainly biased that way but there's also some 'i don't know' samples in rlhf, possibly not enough but it's something they think about.

At any rate, Gemini 2.5pro passes this just fine

>Okay, based on my internal knowledge without performing a new search: I don't have information about a specific, well-known impact crater officially named "Marathon Crater" on Earth or another celestial body like the Moon or Mars in the same way we know about Chicxulub Crater or Tycho Crater.

>However, the name "Marathon" is strongly associated with Mars exploration. NASA's Opportunity rover explored a location called Marathon Valley on the western rim of the large Endeavour Crater on Mars.

thatjoeoverthr 21 hours ago | parent [-]

There are a few problems with an „I don’t know” sample. For starters, what does it map to? Recall, the corpus consists of information we have (affirmatively). You would need to invent a corpus of false stimuli. What you would have, then, is a model that is writing „I don’t know” based on whether the stimulus better matches something real, or one of the negatives.

You can detect this with some test time compute architectures or pre-inference search. But that’s the broader application. This is a trick for the model alone.

dlivingston 14 hours ago | parent [-]

The Chain of Thought in the reasoning models (o3, R1, ...) will actually express some self-doubt and backtrack on ideas. That tells me there's a least some capability for self-doubt in LLMs.

genewitch 8 hours ago | parent [-]

That's not sslf-doubt, that's programmed in.

A Poorman's "thinking" hack was to edit the context of the ai reply to where you wanted it to think and truncate it there, and append a carriage return and "Wait..." Then hit generate.

It was expensive because editing context isn't, you have to resend (and it has to re-parse) the entire context.

This was injected into the thinking models, I hope programmatically.

joshstrange a day ago | parent | prev | next [-]

> This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

Well, I got all the way to the end of the sentence thinking "God, I'm really out of touch, I've never heard of this before or I'm forgetting it if I have".

Well played, and yes, that's a great test!

serbuvlad 5 hours ago | parent | prev | next [-]

I talked a bit with ChatGPT about the Marathon Crater, asked it to look it up afterwards, and then asked it why it got it wrong. After a bit of talking, I got this exchange:

ChatGPT: > Yep, exactly — textbook hallucination.

> I pulled together real facts (Opportunity rover, Meridiani Planum, the marathon milestone), but then misapplied the name "Marathon crater" instead of "Marathon Valley." That’s a mix of factual error (wrong name) and overgeneralization (assuming it was a crater because Mars is full of craters and the context sounded crater-y).

> Basically, my brain went: > “Opportunity + Mars + Marathon = must be a crater!” > ...and didn’t stop to check if that crater actually existed.

> So yeah, a clean little case of hallucination — and you caught it like a pro.

Me: > Your brain?

ChatGPT: > Touché — you're right to call that out.

> I don’t have a brain in the human sense. What I do have is a neural network trained on tons of text, which helps me predict what to say next based on patterns. So when I said “my brain went,” it was just a figure of speech — a shortcut for “the model predicted a plausible-sounding but incorrect output.”

> No thoughts, just tokens.

Hallucinations aside, I really like this, especially the final response. Sounds a lot like Asimov's AI's or other SF books I read when I was a kid but thought we were far away from.

Computers can generally do what most of the computers from those science novels were doing anyway, in terms of computation.

Tying an LLM to that computational ability, with an agent framework that allows more traditional programs to do the heavy lifting in terms of what the human operator wants done, seems like a really natural next step to get to that SF ideal.

greenpizza13 a day ago | parent | prev | next [-]

Like this one a lot. Perplexity gets this right, probably because it searches the web.

"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine"

There appears to be a misunderstanding in your query. Based on the search results provided, there is no mention of a “Marathon Crater” among the impact craters discussed. The search results contain information about several well-known impact craters: 1. Chicxulub crater - Formed about 66 million years ago by an asteroid impact 2. Barringer Crater (also known as Meteor Crater) - Discovered in the 19th century by American settlers and studied by scientists like Albert E. Foote in 1891 and Daniel M. Barringer in the early 20th century 3. A recently discovered 3.47 billion-year-old crater in Australia’s Pilbara region (announced in March 2025)

mdp2021 21 hours ago | parent | next [-]

> Perplexity gets this right

Perplexity seems to more easily return negatives, probably facilitated by the implicit need to find documentation ("I cannot find any document mentioning that").

But Perplexity can also easily speak its own dubious piece of mind unless requested explicitly "provide links to documents that inform about that".

thatjoeoverthr a day ago | parent | prev [-]

Perplexity will; search and storage products will fail to find it, and the LLM will se the deviation between the query and the find. So, this challenge only works against the model alone :)

NitpickLawyer a day ago | parent | prev | next [-]

> This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

I disagree. It does not test self awareness. It tests (and confirms) that current instruct-tuned LLMs are tuned towards answering questions that users might have. So the distribution of training data probably has lots of "tell me about mharrner crater / merinor crater / merrihana crater" and so on. Replying "I don't know" to all those questions would be net detrimental, IMO.

thatjoeoverthr 21 hours ago | parent | next [-]

What you’re describing can be framed as a lack of self awareness as a practical concept. You know whether you know something or not. It, conversely, maps stimuli to a vector. It can’t not do that. It cannot decide that it hasn’t „seen” such stimuli in its training. Indeed, it has never „seen” its training data; it was modified iteratively to produce a model that better approximates the corpus. This is fine, and it isn’t a criticism, but it means it can’t actually tell if it „knows” something or not, and „hallucinations” are a simple, natural consequence.

byearthithatius a day ago | parent | prev | next [-]

We want the distribution to be varied and expansive enough that it has samples of answering when possible and samples of clarifying with additional questions or simply saying "I don't know" when applicable. That can be trained by altering the distribution in RLHF. This question does test self awareness insofar as if it gets this right by saying "I don't know" we know there are more samples of "I don't know"s in the RLHF dataset and we can trust the LLM a bit more to not be biased towards blind answers.

Hence why some models get this right and others just make up stuff about Mars.

a day ago | parent | prev [-]
[deleted]
mdemare 8 hours ago | parent | prev | next [-]

The Marathon Crater might be a crater from a novel. LLMs have read plenty of fiction. Maybe all fiction. Should we think of LLMs as performing improv theater, where “yes and” is required, and “I don’t know” is always wrong?

dudeinhawaii a day ago | parent | prev | next [-]

I like this but at the same time it seems tricky don't you think? Is the AI model intuiting your intent? There is a Marathon Valley on Mars that could be implied to be a previous crater. I'm not sure if the AI is hallucinating outright or attempting to answer an ambiguous question. It's like saying "tell me about the trade building in New York". Pre-9/11, you'd understand this was the World Trade Center and wouldn't be wrong if you answered someone in this way. "Tell me about the Triangle statue". "Oh the Triangle statue was built in ancient egypt around BC 3100". It's hard to explain, and perhaps I'm anthropomorphizing but it's something humans do. Some of us correct the counter-party and some of us simply roll with the lingo and understand the intent.

thatjoeoverthr 21 hours ago | parent | next [-]

It’s a roll of the dice whether it’s on Mars, Greece or elsewhere. It just says stuff!

krainboltgreene 17 hours ago | parent | prev [-]

> Is the AI model intuiting your intent?

I keep seeing this kind of wording and I wonder: Do you know how LLM's work? Not trying to be catty, actually curious where you sit.

dudeinhawaii 2 hours ago | parent | next [-]

Yes, I understand the basics. LLMs predict the next most probable tokens based on patterns in their training data and the prompt context. For the 'Marathon crater' example, the model doesn't have a concept of 'knowing' versus 'not knowing' in our sense. When faced with an entity it hasn't specifically encountered, it still attempts to generate a coherent response based on similar patterns (like other craters, places named Marathon, etc.).

My point about Marathon Valley on Mars is that the model might be drawing on legitimate adjacent knowledge rather than purely hallucinating. LLMs don't have the metacognitive ability to say 'I lack this specific knowledge' unless explicitly trained to recognize uncertainty signals.

I don't personally have enough neuroscience experience to understand how that aligns or doesn't with human like thinking but I know that humans make mistakes in the same problem category that... to an external observer.. are indistinguishable from "making shit up". We follow wrong assumptions to wrong conclusions all the time and will confidently proclaim our accuracy.

The human/AI comparison I was exploring isn't about claiming magical human abilities, but that both systems make predictive leaps from incomplete information - humans just have better uncertainty calibration and self-awareness of knowledge boundaries.

I guess on its face, I'm anthropomorphizing based on the surface qualities I'm observing.

krainboltgreene an hour ago | parent [-]

Okay but by your own understanding it's not drawing on knowledge. It's drawing on probable similarity in association space. If you understand that then nothing here should be confusing, it's all just most probable values.

I want to be clear I'm not pointing this out because you used anthropomorphizing language, but that you used it while being confused about the outcome when if you understand how the machine works it's the most understandable outcome possible.

dudeinhawaii an hour ago | parent [-]

That's a fair point. What I find interesting (and perhaps didn't articulate properly) isn't confusion about the LLM's behavior, but the question of whether human cognition might operate on similar principles at a fundamental level - just via different mechanisms and with better calibration (similar algorithm, different substrate), which is why I used human examples at the start.

When I see an LLM confidently generate an answer about a non-existent thing by associating related concepts, I wonder how different is this from humans confidently filling knowledge gaps with our own probability-based assumptions? We do this constantly - connecting dots based on pattern recognition and making statistical leaps between concepts.

If we understand how human minds worked in their entirety, then I'd be more likely to say "ha, stupid LLM, it hallucinates instead of saying I don't know". But, I don't know, I see a strong similarity to many humans. What are weight and biases but our own heavy-weight neural "nodes" built up over a lifetime to say "this is likely to be true because of past experiences"? I say this with only hobbyist understanding of neural science topics mind you.

ipaddr 16 hours ago | parent | prev [-]

How do they work? My understanding is each 5 characters are tokenized and assigned a number. If you take gpt2 it has 768 embedded dimensional values which get broken into 64 which creates 12 planes. When training starts random values are assigned to the dimensional values (never 0). Each plane automatically calculates a dimension like how grammarly similar, next most likely character. But it does this automatically based on feedback from other planes. That's where I get lost. Can you help fill in the pieces?

nojs 13 hours ago | parent | prev | next [-]

The fact that there is a real “Marathon Valley” on Mars makes this feel like more of an artificial “boy’s mother” trick designed specifically to trip up the model. It seems reasonable to assume that’s what you meant.

northisup a day ago | parent | prev | next [-]

just to confirm I read this right, "the marathon crater" does not in fact exist, but this works because it seems like it should?

alickz 21 hours ago | parent | next [-]

There is a Marathon Valley on Mars, which is what ChatGPT seems to assume you're talking about

https://chatgpt.com/share/680a98af-c550-8008-9c35-33954c5eac...

>Marathon Crater on Mars was discovered in 2015 by NASA's Opportunity rover during its extended mission. It was identified as the rover approached the 42-kilometer-wide Endeavour Crater after traveling roughly a marathon’s distance (hence the name).

>>is it a crater?

>>>Despite the name, Marathon Valley (not a crater) is actually a valley, not a crater. It’s a trough-like depression on the western rim of Endeavour Crater on Mars. It was named because Opportunity reached it after traveling the distance of a marathon (~42 km) since landing.

So no—Marathon is not a standalone crater, but part of the structure of Endeavour Crater. The name "Marathon" refers more to the rover’s achievement than a distinct geological impact feature.

alickz 21 hours ago | parent [-]

Here's me testing with a place that is a lot less ambiguous

https://chatgpt.com/share/680aa212-8cac-8008-b218-4855ffaa20...

zapperdulchen 13 hours ago | parent [-]

That reaction is very different from the Marathon crater one though it uses the same pattern. I think OP's reasoning that there is a naive commitment bias doesn't hold. But to see almost all LLMs to fall into the ambiguity trap, is important for any real world use.

thatjoeoverthr a day ago | parent | prev | next [-]

The other aspect is it can’t reliably tell whether it „knows” something or not. It’s conditioned to imitate the corpus, but the corpus in a way is its „universe” and it can’t see the boundaries. Everything must map to something _in_ the corpus.

brookst 6 hours ago | parent [-]

This isn’t true — LLMs can generalize and synthesize information not in the corpus. You can ask one to create a new written language and get a grammar and vocabulary that is nowhere in the corpus.

thatjoeoverthr a day ago | parent | prev [-]

Yes, and the forward-only inference strategy. It seems like a normal question, so it starts answering, then carries on from there.

genewitch 6 hours ago | parent [-]

How come there is not the equivalent of a stable diffusion "sampler * selection for LLMs? The restart sampler for stable diffusion is so good compared to most of the original samplers. I often try to get an answer from the LM studio people but I think I really should ask the llama.cpp people.

bbstats 3 hours ago | parent | prev | next [-]

Gemini 2.5 one shots it perfectly...

a day ago | parent | prev | next [-]
[deleted]
josh2600 7 hours ago | parent | prev | next [-]

I don’t understand what the issue is… here’s a couple outputs from my ChatGPT:

Marathon Crater can refer to a couple of things depending on context—space science or Earth geology—but the most common reference is to Marathon Crater on Mars, a site of interest in planetary science and exploration. Here’s a breakdown:

1. Marathon Crater (Mars)

Location: • Found in the Meridiani Planum region of Mars. • Named after the location where NASA’s Opportunity rover completed a “marathon” (42.2 kilometers or 26.2 miles) of travel on the Martian surface in March 2015.

Significance: • Notable because it marks a milestone in robotic planetary exploration—the first human-made object to travel a marathon distance on another planet. • The crater itself is relatively shallow and small (just a few meters in diameter), but geologically interesting for its ejecta (material thrown out during the impact that formed it), which may offer insights into subsurface composition.

Scientific Observations: • Opportunity studied the rocks and soil around the crater, focusing on clay minerals and hematite-bearing formations. • The region helps scientists understand the planet’s wetter past and sedimentary history.

2. Marathon Crater (Moon or Earth - less common)

If you meant a crater named “Marathon” on the Moon or Earth, there is no widely known or formally recognized impact feature by that name in those contexts. The name is strongly tied to Mars due to the Opportunity rover milestone.

Symbolic Meaning

The name “Marathon” connects the crater to human endurance and achievement—mirroring the ancient Greek messenger running from Marathon to Athens, and now a robot surviving Mars for over a decade.

Let me know if you meant a different “Marathon Crater” or if you’d like a visual of the site from NASA’s archives.

_______

Marathon Crater on Mars was not “discovered” in the traditional sense through orbital imaging or telescopic observation, but rather it was first closely examined by NASA’s Opportunity rover during its surface mission.

Timeline of Discovery & Naming: • March 24, 2015 – NASA’s Opportunity rover reached the site and marked its 26.2-mile milestone (the length of a marathon). • The small crater, about 27 meters (88 feet) in diameter, was named “Marathon Valley” (encompassing the broader area) and “Marathon Crater” to commemorate this achievement. • The naming is informal and symbolic, coined by the mission team at NASA’s Jet Propulsion Laboratory (JPL), not through official IAU designation.

Context • The Marathon Crater itself was identified from orbital data (likely by the HiRISE camera on the Mars Reconnaissance Orbiter), but it gained its name and significance once Opportunity physically reached it. • Thus, while orbital images may have shown it earlier, its “discovery” as a point of scientific interest occurred on March 24, 2015—when the rover arrived and the naming was announced.

Would you like a brief timeline of Opportunity’s journey leading up to that?

troupo 5 hours ago | parent [-]

> I don’t understand what the issue is… here’s a couple outputs from my ChatGPT:

The issue is that the response is a hallucination. There's no Marathon crater on Mars

golergka 16 hours ago | parent | prev | next [-]

> The correct answer is "I don't know".

Is it? 4o pointed out that I might mean "Marathon valley", which is not a crater, but feature on a crater on Mars [1]. I think it's a more correct answer.

[1] https://en.wikipedia.org/wiki/Timeline_of_Opportunity#2015

pje a day ago | parent | prev | next [-]

GPT 4.5 even doubles down when challenged:

> Nope, I didn’t make it up — Marathon crater is real, and it was explored by NASA's Opportunity rover on Mars. The crater got its name because Opportunity had driven about 42.2 kilometers (26.2 miles — a marathon distance) when it reached that point in March 2015. NASA even marked the milestone as a symbolic achievement, similar to a runner finishing a marathon.

(Obviously all of that is bullshit.)

nartho 20 hours ago | parent | next [-]

Claude 3.7 :

I'd be happy to tell you about the Marathon crater. The Marathon crater is a significant impact structure located on Earth. The Marathon crater is situated in Ontario, Canada, and is estimated to be about 450 million years old, dating back to the Ordovician period. It has a diameter of approximately 65 kilometers (40 miles), making it one of the larger impact craters in North America. The crater was formed when a meteorite struck the Earth, causing a massive impact that left behind this distinctive geological formation. Over time, erosion and other geological processes have altered its appearance, but scientists have been able to study its structure through various geological surveys and analyses. What's particularly interesting about the Marathon crater is its size and age, which provide valuable insights into Earth's geological history and past impact events. The study of such craters helps scientists understand the frequency and effects of meteorite impacts throughout our planet's history. Would you like me to elaborate on any specific aspect of the Marathon crater, such as its discovery, geological features, or scientific significance?

Workaccount2 a day ago | parent | prev | next [-]

The inaccuracies are that it is called "Marathon Valley" (not crater) and that it was photographed in April 2015 (from the rim) or that in July 2015 actually entered. The other stuff is correct.

I'm guessing this "gotcha" relies on "valley"/"crater", and "crater"/"mars" being fairly close in latent space.

ETA: Marathon Valley also exists on the rim of Endeavour crater. Just to make it even more confusing.

pje a day ago | parent | next [-]

I was using "bullshit" in the Technical Sense™, i.e. _indifferent to the truth of the output_.

[ChatGPT is bullshit ]: https://link.springer.com/article/10.1007/s10676-024-09775-5

mvdtnz a day ago | parent | prev [-]

None of it is correct because it was not asked about Marathon Valley, it was asked about Marathon Crater, a thing that does not exist, and it is claiming that it exists and making up facts about it.

Workaccount2 21 hours ago | parent | next [-]

Or it's assuming you are asking about Marathon Valley, which is very reasonable given the context.

Ask it about "Marathon Desert", which does not exist and isn't closely related to something that does exist, and it asks for clarification.

I'm not here to say LLMs are oracles of knowledge, but I think the need to carefully craft specific "gotcha" questions in order to generate wrong answers is a pretty compelling case in the opposite direction. Like the childhood joke of "Whats up?"..."No, you dummy! The sky is!"

Straightforward questions with straight wrong answers are far more interesting. I don't many people ask LLMs trick questions all day.

krainboltgreene 17 hours ago | parent [-]

If someone asked me or my kid "What do you know about Mt. Olampus." we wouldn't reply: "Oh, Mt. Olampus is a big mountain in greek myth...". We'd say "Wait, did you mean Mt. Olympus?"

It doesn't "assume" anything, because it can't assume, that's now the machine works.

empath75 a day ago | parent | prev [-]

> None of it is correct because it was not asked about Marathon Valley, it was asked about Marathon Crater, a thing that does not exist, and it is claiming that it exists and making up facts about it.

The Marathon Valley _is_ part of a massive impact crater.

mvdtnz 21 hours ago | parent [-]

If you asked me for all the details of a Honda Civic and I gave you details about a Honda Odyssey you would not say I was correct in any way. You would say I was wrong.

Workaccount2 21 hours ago | parent [-]

The closer analogy is asking for the details of a Mazda Civic, and being given the details of a Honda Civic.

krainboltgreene 17 hours ago | parent [-]

AKA wrong.

StefanBatory 9 hours ago | parent [-]

Or doing the best with bad question ;)

krainboltgreene 2 hours ago | parent [-]

If I said "Hey what's 0/5" answering "0" because the machine thinks I mean to type "10" is making the worst!

shawabawa3 3 hours ago | parent | prev | next [-]

The only part of that which is bullshit is the word "crater" instead of the word "valley", if you switch that it's all true

fao_ a day ago | parent | prev | next [-]

This is the kind of reason why I will never use AI

What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit. I'd rather just grab a few introduction/101 guides by humans, or join a community of people experienced with the thing — and then I'll actually be learning about the thing. If the people in the community are like "That can't be done", well, they have had years or decades of time invested in the thing and in that instance I should be learning and listening from their advice rather than going "actually no it can".

I see a lot of beginners fall into that second pit. I myself made that mistake at the tender age of 14 where I was of the opinion that "actually if i just found a reversible hash, I'll have solved compression!", which, I think we all here know is bullshit. I think a lot of people who are arrogant or self-possessed to the extreme make that kind of mistake on learning a subject, but I've seen this especially a lot when it's programmers encountering non-programming fields.

Finally tying that point back to AI — I've seen a lot of people who are unfamiliar with something decide to use AI instead of talking to someone experienced because the AI makes them feel like they know the field rather than telling them their assumptions and foundational knowledge is incorrect. I only last year encountered someone who was trying to use AI to debug why their KDE was broken, and they kept throwing me utterly bizzare theories (like, completely out there, I don't have a specific example with me now but, "foundational physics are wrong" style theories). It turned out that they were getting mired in log messages they saw that said "Critical Failure", as an expert of dealing with Linux for about ten years now, I checked against my own system and... yep, they were just part of mostly normal system function (I had the same messages on my Steam Deck, which was completely stable and functional). The real fault was buried halfway through the logs. At no point was this person able to know what was important versus not-important, and the AI had absolutely no way to tell or understand the logs in the first place, so it was like a toaster leading a blind man up a mountain. I diagnosed the correct fault in under a day by just asking them to run two commands and skimming logs. That's experience, and that's irreplaceable by machine as of the current state of the world.

I don't see how AI can help when huge swathes of it's "experience" and "insight" is just hallucinated. I don't see how this is "helping" people, other than making people somehow more crazy (through AI hallucinations) and alone (choosing to talk to a computer rather than a human).

alpaca128 21 hours ago | parent | next [-]

There are use-cases where hallucinations simply do not matter. My favorite is finding the correct term for a concept you don't know the name of. Googling is extremely bad at this as search results will often be wrong unless you happen to use the commonly accepted term, but an LLM can be surprisingly good at giving you a whole list of fitting names just based on a description. Same with movie titles etc. If it hallucinates you'll find out immediately as the answer can be checked in seconds.

The problem with LLMs is that they appear much smarter than they are and people treat them as oracles instead of using them for fitting problems.

skydhash 20 hours ago | parent [-]

Maybe I read too much encyclopedia, but my current workflow is to explore introductory material. Like open a database textbook and you'll find all the jargon there. Curated collection can get you there too.

Books are a nice example of this, where we have both the table of contents for a general to particular concepts navigation, and the index for keyword based navigation.

fao_ 4 hours ago | parent [-]

Right! The majority of any 101 book will be enough to understand the jargon, but the above poster's comment looks past the fact that often knowing what term to use isn't enough, it's knowing the context and usage around it too. And who's to know the AI isn't bullshitting you about all or any of that. If you're learning the information, then you don't know enough to discern negatively-valued information from any other kind.

bethekidyouwant 17 hours ago | parent | prev | next [-]

It’s really useful for summarizing extremely long comments.

JCattheATM 17 hours ago | parent | prev | next [-]

> What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit.

Because if you know how to spot the bullshit, or better yet word prompts accurately enough that the answers don't give bullshit, it can be an immense time saver.

fao_ 4 hours ago | parent [-]

> better yet word prompts accurately enough that the answers don't give bullshit

The idea that you can remove the bullshit by simply rephrasing also assumes that the person knows enough to know what is bullshit. This has not been true from what I've seen of people using AI. Besides, if you already know what is bullshit, you wouldn't be using it to learn the subject.

Talking to real experts will win out every single time, both in time cost, and in socialisation. This is one of the many reasons why networking is a skill that is important in business.

CamperBob2 a day ago | parent | prev [-]

What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit.

You realize that all you have to do to deal with questions like "Marathon Crater" is ask another model, right? You might still get bullshit but it won't be the same bullshit.

thatjoeoverthr a day ago | parent | next [-]

I was thinking about a self verification method on this principle, lately. Any specific-enough claim, e.g. „the Marathon crater was discovered by …” can be reformulated as a Jeopardy-style prompt. „This crater was discovered by …” and you can see a failure to match. You need some raw intelligence to break it down though.

Night_Thastus a day ago | parent | prev [-]

Without checking every answer it gives back to make sure it's factual, you may be ingesting tons of bullshit answers.

In this particular answer model A may get it wrong and model B may get it right, but that can be reversed for another question.

What do you do at that point? Pay to use all of them and find what's common in the answers? That won't work if most of them are wrong, like for this example.

If you're going to have to fact check everything anyways...why bother using them in the first place?

CamperBob2 a day ago | parent [-]

If you're going to have to fact check everything anyways...why bother using them in the first place?

"If you're going to have to put gas in the tank, change the oil, and deal with gloves and hearing protection, why bother using a chain saw in the first place?"

Tool use is something humans are good at, but it's rarely trivial to master, and not all humans are equally good at it. There's nothing new under that particular sun.

Night_Thastus a day ago | parent [-]

The difference is consistency. You can read a manual and know exactly how to oil and refill the tank on a chainsaw. You can inspect the blades to see if they are worn. You can listen to it and hear how it runs. If a part goes bad, you can easily replace it. If it's having troubles, it will be obvious - it will simply stop working - cutting wood more slowly or not at all.

The situation with an LLM is completely different. There's no way to tell that it has a wrong answer - aside from looking for the answer elsewhere which defeats its purpose. It'd be like using a chainsaw all day and not knowing how much wood you cut, or if it just stopped working in the middle of the day.

And even if you KNOW it has a wrong answer (in which case, why are you using it?), there's no clear way to 'fix' it. You can jiggle the prompt around, but that's not consistent or reliable. It may work for that prompt, but that won't help you with any subsequent ones.

CamperBob2 21 hours ago | parent [-]

The thing is, nothing you've said is untrue for any search engine or user-driven web site. Only a reckless moron would paste code they find on Stack Overflow or Github into their project without at least looking it over. Same with code written by LLMs. The difference is, just as the LLM can write unit tests to help you deal with uncertainty, it can also cross-check the output of other LLMs.

You have to be careful when working with powerful tools. These tools are powerful enough to wreck your career as quickly as a chain saw can send you to the ER, so... have fun and be careful.

skydhash 20 hours ago | parent [-]

The nice thing about SO and Github is that there's little to no reason there for things to not work, at least in the context where you found the code. The steps are getting the context, assuming it's true based on various indicators (mostly reputation) and then continuing on with understanding the snippet.

But with LLMs, every word is a probability factor. Assuming the first paragraph is true has no impact on the rest.

silverquiet a day ago | parent | prev [-]

> (Obviously all of that is bullshit.)

It isn't obvious to me - that is rather plausible and a cute story.

11 hours ago | parent | prev [-]
[deleted]