Remix clone Hacker News

GPT 4.5 even doubles down when challenged:

> Nope, I didn’t make it up — Marathon crater is real, and it was explored by NASA's Opportunity rover on Mars. The crater got its name because Opportunity had driven about 42.2 kilometers (26.2 miles — a marathon distance) when it reached that point in March 2015. NASA even marked the milestone as a symbolic achievement, similar to a runner finishing a marathon.

(Obviously all of that is bullshit.)

▲

nartho 20 hours ago | parent | next [-]

Claude 3.7 :

I'd be happy to tell you about the Marathon crater. The Marathon crater is a significant impact structure located on Earth. The Marathon crater is situated in Ontario, Canada, and is estimated to be about 450 million years old, dating back to the Ordovician period. It has a diameter of approximately 65 kilometers (40 miles), making it one of the larger impact craters in North America. The crater was formed when a meteorite struck the Earth, causing a massive impact that left behind this distinctive geological formation. Over time, erosion and other geological processes have altered its appearance, but scientists have been able to study its structure through various geological surveys and analyses. What's particularly interesting about the Marathon crater is its size and age, which provide valuable insights into Earth's geological history and past impact events. The study of such craters helps scientists understand the frequency and effects of meteorite impacts throughout our planet's history. Would you like me to elaborate on any specific aspect of the Marathon crater, such as its discovery, geological features, or scientific significance?

▲

Workaccount2 a day ago | parent | prev | next [-]

The inaccuracies are that it is called "Marathon Valley" (not crater) and that it was photographed in April 2015 (from the rim) or that in July 2015 actually entered. The other stuff is correct.

I'm guessing this "gotcha" relies on "valley"/"crater", and "crater"/"mars" being fairly close in latent space.

ETA: Marathon Valley also exists on the rim of Endeavour crater. Just to make it even more confusing.

▲

pje a day ago | parent | next [-]

I was using "bullshit" in the Technical Sense™, i.e. _indifferent to the truth of the output_.

[ChatGPT is bullshit ]: https://link.springer.com/article/10.1007/s10676-024-09775-5

▲

mvdtnz a day ago | parent | prev [-]

None of it is correct because it was not asked about Marathon Valley, it was asked about Marathon Crater, a thing that does not exist, and it is claiming that it exists and making up facts about it.

▲

Workaccount2 21 hours ago | parent | next [-]

Or it's assuming you are asking about Marathon Valley, which is very reasonable given the context.

Ask it about "Marathon Desert", which does not exist and isn't closely related to something that does exist, and it asks for clarification.

I'm not here to say LLMs are oracles of knowledge, but I think the need to carefully craft specific "gotcha" questions in order to generate wrong answers is a pretty compelling case in the opposite direction. Like the childhood joke of "Whats up?"..."No, you dummy! The sky is!"

Straightforward questions with straight wrong answers are far more interesting. I don't many people ask LLMs trick questions all day.

	▲	krainboltgreene 17 hours ago \| parent [-]
		If someone asked me or my kid "What do you know about Mt. Olampus." we wouldn't reply: "Oh, Mt. Olampus is a big mountain in greek myth...". We'd say "Wait, did you mean Mt. Olympus?" It doesn't "assume" anything, because it can't assume, that's now the machine works.

▲

empath75 21 hours ago | parent | prev [-]

> None of it is correct because it was not asked about Marathon Valley, it was asked about Marathon Crater, a thing that does not exist, and it is claiming that it exists and making up facts about it.

The Marathon Valley _is_ part of a massive impact crater.

▲

mvdtnz 21 hours ago | parent [-]

If you asked me for all the details of a Honda Civic and I gave you details about a Honda Odyssey you would not say I was correct in any way. You would say I was wrong.

▲

Workaccount2 21 hours ago | parent [-]

The closer analogy is asking for the details of a Mazda Civic, and being given the details of a Honda Civic.

▲

krainboltgreene 17 hours ago | parent [-]

AKA wrong.

▲

StefanBatory 8 hours ago | parent [-]

Or doing the best with bad question ;)

	▲	krainboltgreene an hour ago \| parent [-]
		If I said "Hey what's 0/5" answering "0" because the machine thinks I mean to type "10" is making the worst!

▲

shawabawa3 3 hours ago | parent | prev | next [-]

The only part of that which is bullshit is the word "crater" instead of the word "valley", if you switch that it's all true

▲

fao_ a day ago | parent | prev | next [-]

This is the kind of reason why I will never use AI

What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit. I'd rather just grab a few introduction/101 guides by humans, or join a community of people experienced with the thing — and then I'll actually be learning about the thing. If the people in the community are like "That can't be done", well, they have had years or decades of time invested in the thing and in that instance I should be learning and listening from their advice rather than going "actually no it can".

I see a lot of beginners fall into that second pit. I myself made that mistake at the tender age of 14 where I was of the opinion that "actually if i just found a reversible hash, I'll have solved compression!", which, I think we all here know is bullshit. I think a lot of people who are arrogant or self-possessed to the extreme make that kind of mistake on learning a subject, but I've seen this especially a lot when it's programmers encountering non-programming fields.

Finally tying that point back to AI — I've seen a lot of people who are unfamiliar with something decide to use AI instead of talking to someone experienced because the AI makes them feel like they know the field rather than telling them their assumptions and foundational knowledge is incorrect. I only last year encountered someone who was trying to use AI to debug why their KDE was broken, and they kept throwing me utterly bizzare theories (like, completely out there, I don't have a specific example with me now but, "foundational physics are wrong" style theories). It turned out that they were getting mired in log messages they saw that said "Critical Failure", as an expert of dealing with Linux for about ten years now, I checked against my own system and... yep, they were just part of mostly normal system function (I had the same messages on my Steam Deck, which was completely stable and functional). The real fault was buried halfway through the logs. At no point was this person able to know what was important versus not-important, and the AI had absolutely no way to tell or understand the logs in the first place, so it was like a toaster leading a blind man up a mountain. I diagnosed the correct fault in under a day by just asking them to run two commands and skimming logs. That's experience, and that's irreplaceable by machine as of the current state of the world.

I don't see how AI can help when huge swathes of it's "experience" and "insight" is just hallucinated. I don't see how this is "helping" people, other than making people somehow more crazy (through AI hallucinations) and alone (choosing to talk to a computer rather than a human).

▲

alpaca128 21 hours ago | parent | next [-]

There are use-cases where hallucinations simply do not matter. My favorite is finding the correct term for a concept you don't know the name of. Googling is extremely bad at this as search results will often be wrong unless you happen to use the commonly accepted term, but an LLM can be surprisingly good at giving you a whole list of fitting names just based on a description. Same with movie titles etc. If it hallucinates you'll find out immediately as the answer can be checked in seconds.

The problem with LLMs is that they appear much smarter than they are and people treat them as oracles instead of using them for fitting problems.

▲

skydhash 20 hours ago | parent [-]

Maybe I read too much encyclopedia, but my current workflow is to explore introductory material. Like open a database textbook and you'll find all the jargon there. Curated collection can get you there too.

Books are a nice example of this, where we have both the table of contents for a general to particular concepts navigation, and the index for keyword based navigation.

	▲	fao_ 4 hours ago \| parent [-]
		Right! The majority of any 101 book will be enough to understand the jargon, but the above poster's comment looks past the fact that often knowing what term to use isn't enough, it's knowing the context and usage around it too. And who's to know the AI isn't bullshitting you about all or any of that. If you're learning the information, then you don't know enough to discern negatively-valued information from any other kind.

▲

bethekidyouwant 16 hours ago | parent | prev | next [-]

It’s really useful for summarizing extremely long comments.

▲

JCattheATM 17 hours ago | parent | prev | next [-]

> What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit.

Because if you know how to spot the bullshit, or better yet word prompts accurately enough that the answers don't give bullshit, it can be an immense time saver.

	▲	fao_ 4 hours ago \| parent [-]
		> better yet word prompts accurately enough that the answers don't give bullshit The idea that you can remove the bullshit by simply rephrasing also assumes that the person knows enough to know what is bullshit. This has not been true from what I've seen of people using AI. Besides, if you already know what is bullshit, you wouldn't be using it to learn the subject. Talking to real experts will win out every single time, both in time cost, and in socialisation. This is one of the many reasons why networking is a skill that is important in business.

▲

CamperBob2 a day ago | parent | prev [-]

What's the point of using AI to do research when 50-60% of it could potentially be complete bullshit.

You realize that all you have to do to deal with questions like "Marathon Crater" is ask another model, right? You might still get bullshit but it won't be the same bullshit.

▲

thatjoeoverthr 21 hours ago | parent | next [-]

I was thinking about a self verification method on this principle, lately. Any specific-enough claim, e.g. „the Marathon crater was discovered by …” can be reformulated as a Jeopardy-style prompt. „This crater was discovered by …” and you can see a failure to match. You need some raw intelligence to break it down though.

▲

Night_Thastus a day ago | parent | prev [-]

Without checking every answer it gives back to make sure it's factual, you may be ingesting tons of bullshit answers.

In this particular answer model A may get it wrong and model B may get it right, but that can be reversed for another question.

What do you do at that point? Pay to use all of them and find what's common in the answers? That won't work if most of them are wrong, like for this example.

If you're going to have to fact check everything anyways...why bother using them in the first place?

▲

CamperBob2 21 hours ago | parent [-]

If you're going to have to fact check everything anyways...why bother using them in the first place?

"If you're going to have to put gas in the tank, change the oil, and deal with gloves and hearing protection, why bother using a chain saw in the first place?"

Tool use is something humans are good at, but it's rarely trivial to master, and not all humans are equally good at it. There's nothing new under that particular sun.

▲

Night_Thastus 21 hours ago | parent [-]

The difference is consistency. You can read a manual and know exactly how to oil and refill the tank on a chainsaw. You can inspect the blades to see if they are worn. You can listen to it and hear how it runs. If a part goes bad, you can easily replace it. If it's having troubles, it will be obvious - it will simply stop working - cutting wood more slowly or not at all.

The situation with an LLM is completely different. There's no way to tell that it has a wrong answer - aside from looking for the answer elsewhere which defeats its purpose. It'd be like using a chainsaw all day and not knowing how much wood you cut, or if it just stopped working in the middle of the day.

And even if you KNOW it has a wrong answer (in which case, why are you using it?), there's no clear way to 'fix' it. You can jiggle the prompt around, but that's not consistent or reliable. It may work for that prompt, but that won't help you with any subsequent ones.

▲

CamperBob2 21 hours ago | parent [-]

The thing is, nothing you've said is untrue for any search engine or user-driven web site. Only a reckless moron would paste code they find on Stack Overflow or Github into their project without at least looking it over. Same with code written by LLMs. The difference is, just as the LLM can write unit tests to help you deal with uncertainty, it can also cross-check the output of other LLMs.

You have to be careful when working with powerful tools. These tools are powerful enough to wreck your career as quickly as a chain saw can send you to the ER, so... have fun and be careful.

	▲	skydhash 20 hours ago \| parent [-]
		The nice thing about SO and Github is that there's little to no reason there for things to not work, at least in the context where you found the code. The steps are getting the context, assuming it's true based on various indicators (mostly reputation) and then continuing on with understanding the snippet. But with LLMs, every word is a probability factor. Assuming the first paragraph is true has no impact on the rest.

▲

silverquiet a day ago | parent | prev [-]

> (Obviously all of that is bullshit.)

It isn't obvious to me - that is rather plausible and a cute story.